Message 121561 - Python tracker

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

In-reply-to
Author	belopolsky
Recipients	amaury.forgeotdarc, belopolsky, vstinner
Date	2010年11月19日.19:42:51
SpamBayes Score	3.3871908e-09
Marked as misclassified	No
Message-id	<1290195773.07.0.831312765946.issue9769@psf.upfronthosting.co.za>

Content
I don't understand Victor's argument in msg115889. According to UTF-8 RFC, <http://www.ietf.org/rfc/rfc2279.txt>: - US-ASCII values do not appear otherwise in a UTF-8 encoded character stream. This provides compatibility with file systems or other software (e.g. the printf() function in C libraries) that parse based on US-ASCII values but are transparent to other values. This means that printf-like formatters should not care whether the format string is in UTF-8, Latin1, or any other ASCII-compatible 8-bit encoding. (Passing in multibyte encoding pretending to be bytes would of course lead to havoc, but C type system will protect you from that.) It is also fairly simple to ssnity-check for UTF-8 if necessary, but in case of PyUnicode_FromFormat, the resulting string will be decoded as UTF-8, so all characters in the format string will be checked anyways. Am I missing something?

Content

I don't understand Victor's argument in msg115889. According to UTF-8 RFC, <http://www.ietf.org/rfc/rfc2279.txt>:
 - US-ASCII values do not appear otherwise in a UTF-8 encoded
 character stream. This provides compatibility with file systems
 or other software (e.g. the printf() function in C libraries) that
 parse based on US-ASCII values but are transparent to other
 values.
This means that printf-like formatters should not care whether the format string is in UTF-8, Latin1, or any other ASCII-compatible 8-bit encoding. (Passing in multibyte encoding pretending to be bytes would of course lead to havoc, but C type system will protect you from that.)
It is also fairly simple to ssnity-check for UTF-8 if necessary, but in case of PyUnicode_FromFormat, the resulting string will be decoded as UTF-8, so all characters in the format string will be checked anyways.
Am I missing something?

History
Date	User	Action	Args
2010年11月19日 19:42:53	belopolsky	set	recipients: + belopolsky, amaury.forgeotdarc, vstinner
2010年11月19日 19:42:53	belopolsky	set	messageid: <1290195773.07.0.831312765946.issue9769@psf.upfronthosting.co.za>
2010年11月19日 19:42:51	belopolsky	link	issue9769 messages
2010年11月19日 19:42:51	belopolsky	create

homepage