Message243660
| Author |
eryksun |
| Recipients |
belopolsky, brian.curtin, eryksun, ocean-city, pitrou, python-dev, vstinner |
| Date |
2015年05月20日.13:30:40 |
| SpamBayes Score |
-1.0 |
| Marked as misclassified |
Yes |
| Message-id |
<1432128640.8.0.377125320908.issue10653@psf.upfronthosting.co.za> |
| In-reply-to |
| Content |
This solution no longer works. If the system is configured to use the Japanese system locale and language pack, then 3.4.3 returns codepage 932 mojibake for the "%Z" time zone name. Originally [this approach worked][1] because it called PyUnicode_Decode using the 'mbcs' encoding.
Currently it calls PyUnicode_DecodeLocaleAndSize, which just ends up calling mbstowcs. That's pretty much what wcsftime does. In the default C locale, mbstowcs casts the byte values to wchar_t:
>>> time.strftime('%Z')
'\x91\xbe\x95\xbd\x97m\x89\xc4\x8e\x9e\x8a\xd4'
>>> time.strftime('%Z').encode('latin-1').decode('932')
'太平洋夏時間'
The problem is worse for 3.5 built with VC++ 14. In the new CRT strftime decodes the format string via MultiByteToWideChar, calls _Wcsftime_l, and encodes the result back via WideCharToMultiByte. The outer conversions use the default LC_TIME codepage, which is ANSI (ACP), so they're not the problem. The problem is the internal _mbstowcs_s_l conversion of the ANSI time zone name, which creates the above-shown mojibake 'unicode' string. This is then compounded by calling WideCharToMultiByte on the result:
>>> time.strftime('%Z')
'?????m?A???O'
There's no way to fix this by transcoding. The result is just garbage.
[1]: https://hg.python.org/cpython/file/79e60977fc04/Modules/timemodule.c#l501 |
|