homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients amaury.forgeotdarc, belopolsky, ezio.melotti, vstinner
Date 2010年11月19日.20:06:13
SpamBayes Score 3.330669e-16
Marked as misclassified No
Message-id <201011192106.06983.victor.stinner@haypocalc.com>
In-reply-to <1290195773.07.0.831312765946.issue9769@psf.upfronthosting.co.za>
Content
On Friday 19 November 2010 20:42:53 you wrote:
> Alexander Belopolsky <belopolsky@users.sourceforge.net> added the comment:
> 
> I don't understand Victor's argument in msg115889. According to UTF-8 RFC,
> <http://www.ietf.org/rfc/rfc2279.txt>:
> 
> - US-ASCII values do not appear otherwise in a UTF-8 encoded
> character stream. This provides compatibility with file systems
> or other software (e.g. the printf() function in C libraries) that
> parse based on US-ASCII values but are transparent to other
> values.
Most C functions including printf works on multi*byte* strings, not on (wide) 
character strings. Whereas PyUnicode_FromFormatV() converts the format string 
(bytes) to unicode (characters). If you would like a comparaison in C, it's 
like printf()+mbstowcs() in the same function.
> This means that printf-like formatters should not care whether the format
> string is in UTF-8, Latin1, or any other ASCII-compatible 8-bit encoding. 
It's maybe true with bytes input and bytes output (eg. PyString_FromFormatV() 
of Python2), but it's no more true with bytes input and str output (eg. 
PyUnicode_FromFormatV() of Python3).
> It is also fairly simple to ssnity-check for UTF-8 if necessary, but in
> case of PyUnicode_FromFormat, the resulting string will be decoded as
> UTF-8, so all characters in the format string will be checked anyways.
I choosed to use ASCII instead of UTF-8, because an UTF-8 decoder is long (210 
lines) and complex (see PyUnicode_DecodeUTF8Stateful()), whereas ASCII decode 
is just: "unicode_char = (Py_UNICODE)byte;" + an if before to check that 0 <= 
byte <= 127).
Nobody noticed my change just because the whole Python code base only uses 
ASCII argument for the format argument of PyUnicode_FromFormatV().
Victor
History
Date User Action Args
2010年11月19日 20:06:23vstinnersetrecipients: + vstinner, amaury.forgeotdarc, belopolsky, ezio.melotti
2010年11月19日 20:06:13vstinnerlinkissue9769 messages
2010年11月19日 20:06:13vstinnercreate

AltStyle によって変換されたページ (->オリジナル) /