Message121561
| Author |
belopolsky |
| Recipients |
amaury.forgeotdarc, belopolsky, vstinner |
| Date |
2010年11月19日.19:42:51 |
| SpamBayes Score |
3.3871908e-09 |
| Marked as misclassified |
No |
| Message-id |
<1290195773.07.0.831312765946.issue9769@psf.upfronthosting.co.za> |
| In-reply-to |
| Content |
I don't understand Victor's argument in msg115889. According to UTF-8 RFC, <http://www.ietf.org/rfc/rfc2279.txt>:
- US-ASCII values do not appear otherwise in a UTF-8 encoded
character stream. This provides compatibility with file systems
or other software (e.g. the printf() function in C libraries) that
parse based on US-ASCII values but are transparent to other
values.
This means that printf-like formatters should not care whether the format string is in UTF-8, Latin1, or any other ASCII-compatible 8-bit encoding. (Passing in multibyte encoding pretending to be bytes would of course lead to havoc, but C type system will protect you from that.)
It is also fairly simple to ssnity-check for UTF-8 if necessary, but in case of PyUnicode_FromFormat, the resulting string will be decoded as UTF-8, so all characters in the format string will be checked anyways.
Am I missing something? |
|
History
|
|---|
| Date |
User |
Action |
Args |
| 2010年11月19日 19:42:53 | belopolsky | set | recipients:
+ belopolsky, amaury.forgeotdarc, vstinner |
| 2010年11月19日 19:42:53 | belopolsky | set | messageid: <1290195773.07.0.831312765946.issue9769@psf.upfronthosting.co.za> |
| 2010年11月19日 19:42:51 | belopolsky | link | issue9769 messages |
| 2010年11月19日 19:42:51 | belopolsky | create |
|