Issue 16322: time.tzname on Python 3.3.0 for Windows is decoded by wrong encoding

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/60526

classification

Title:	time.tzname on Python 3.3.0 for Windows is decoded by wrong encoding
Type:	behavior	Stage:	resolved
Components:	Extension Modules, Windows	Versions:	Python 3.6, Python 3.4, Python 3.5

process

Status:	closed	Resolution:	duplicate
Dependencies:	Superseder:	time.tzname returns empty string on Windows if default codepage is a Unicode codepage View: 36779
Assigned To:	Nosy List:	amaury.forgeotdarc, belopolsky, eryksun, jcea, msmhrt, ocean-city, p-ganssle, prikryl, vstinner
Priority:	normal	Keywords:	3.3regression, patch

Created on 2012年10月25日 11:56 by msmhrt, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
tzname_bug.py	prikryl, 2015年09月18日 17:25	The example uploaded as is.

Pull Requests
URL	Status	Linked	Edit
PR 3740	closed	denis-osipov, 2017年09月25日 07:07

Messages (23)
msg173755 - (view)	Author: Masami HIRATA (msmhrt)	Date: 2012年10月25日 11:56
OS: Windows 7 Starter Edition SP1 (32-bit) Japanese version Python: 3.3.0 for Windows x86 (python-3.3.0.msi) time.tzname on Python 3.3.0 for Windows is decoded by wrong encoding. C:\Python33>python.exe Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:55:48) [MSC v.1600 32 bit (In tel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import time >>> time.tzname[0] '\x93\x8c\x8b\x9e (\x95W\x8f\x80\x8e\x9e)' >>> time.tzname[0].encode('iso-8859-1').decode('mbcs') '東京 (標準時)' >>> '東京 (標準時)' means 'Tokyo (Standard Time)' in Japanese. time.tzname on Python 3.2.3 for Windows works correctly. C:\Python32>python.exe Python 3.2.3 (default, Apr 11 2012, 07:15:24) [MSC v.1500 32 bit (Intel)] on win 32 Type "help", "copyright", "credits" or "license" for more information. >>> import time >>> time.tzname[0] '東京 (標準時)' >>>
msg173758 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2012年10月25日 13:47
I see in 3.3 PyUnicode_DecodeFSDefaultAndSize() was replaced by PyUnicode_DecodeLocale(). What show sys.getdefaultencoding(), sys.getfilesystemencoding(), and locale.getpreferredencoding()?
msg173772 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer)	Date: 2012年10月25日 17:31
Looking at the CRT source code, tznames should be decoded with mbcs. See also http://mail.python.org/pipermail/python-3000/2007-August/009290.html
msg173784 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2012年10月25日 18:06
As I understand, OP has UTF-8 locale.
msg173798 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2012年10月25日 20:30
>I see in 3.3 PyUnicode_DecodeFSDefaultAndSize() was replaced > by PyUnicode_DecodeLocale(). Related changes: - 8620e6901e58 for the issue #5905 - 279b0aee0cfb for the issue #13560 I wrote 8620e6901e58 for Linux, when the wcsftime() function is missing. The problem is the changeset 279b0aee0cfb: it introduces a regression on Windows. It looks like PyUnicode_DecodeFSDefault() and PyUnicode_DecodeFSDefault() use a different encoding on Windows. I suppose that we need to add an #ifdef MS_WINDOWS to use PyUnicode_DecodeFSDefault() on Windows, and PyUnicode_DecodeFSDefault() on Linux. See also the issue #10653: time.strftime() uses strftime() (bytes) instead of wcsftime() (unicode) on Windows, because wcsftime() and tzname format the timezone differently.
msg173806 - (view)	Author: Masami HIRATA (msmhrt)	Date: 2012年10月25日 22:55
> What show sys.getdefaultencoding(), sys.getfilesystemencoding(), and locale.getpreferredencoding()? C:\Python33>python.exe Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:55:48) [MSC v.1600 32 bit (In tel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> sys.getdefaultencoding() 'utf-8' >>> sys.getfilesystemencoding() 'mbcs' >>> import locale >>> locale.getpreferredencoding() 'cp932' >>> 'cp932' is the same as 'mbcs' in the Japanese environment.
msg173824 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2012年10月26日 07:07
> >>> sys.getfilesystemencoding() > 'mbcs' > >>> import locale > >>> locale.getpreferredencoding() > 'cp932' > >>> > > 'cp932' is the same as 'mbcs' in the Japanese environment. And what is the value.of locale.getpreferredencoding(False)?
msg173827 - (view)	Author: Masami HIRATA (msmhrt)	Date: 2012年10月26日 09:08
> And what is the value.of locale.getpreferredencoding(False)? >>> import locale >>> locale.getpreferredencoding(False) 'cp932' >>>
msg174161 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2012年10月29日 23:19
See also the issue #836035.
msg174164 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2012年10月29日 23:59
According to CRT source code: - tzset() uses WideCharToMultiByte(lc_cp, 0, tzinfo.StandardName, -1, tzname[0], _TZ_STRINGS_SIZE - 1, NULL, &defused) with lc_cp = ___lc_codepage_func(). - wcsftime("%z") and wcsftime("%Z") use _mbstowcs_s_l() to decode the time zone name I tried to call ___lc_codepage_func(): it returns 0. I suppose that it means that mbstowcs() and wcstombs() use the ANSI code page. Instead of trying to bet what is the correct encoding, it would be simpler (and safer) to read the Unicode version of the tzname array: StandardName and DaylightName of GetTimeZoneInformation(). If anything is changed, time.strftime(), time.strptime(), datetime.datetime.strftime() and time.tzname must be checked (with "%Z" format).
msg174165 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2012年10月30日 00:28
"Instead of trying to bet what is the correct encoding, it would be simpler (and safer) to read the Unicode version of the tzname array: StandardName and DaylightName of GetTimeZoneInformation()." GetTimeZoneInformation() formats correctly timezone names, but it reintroduces #10653 issue: time.strftime("%Z") formats the timezone name differently. See also issue #13029 which is a duplicate of #10653, but contains useful information. -- Example on Windows 7 with a french setup configured to Tokyo's timezone. Using GetTimeZoneInformation(), time.tzname is ("Tokyo", "Tokyo (heure d\u2019\xe9t\xe9)"). U+2019 is the "RIGHT SINGLE QUOTATION MARK". This character is usually replaced with U+0027 (APOSTROPHE) in ASCII. time.strftime("%Z") gives "Tokyo (heure d'\x81\x66ete)" (if it is implemented using strftime() or wcsftime()). -- If I understood correctly, Python 3.3 has two issues on Windows: * time.tzname is decoded from the wrong encoding * time.strftime("%Z") gives an invalid output The real blocker issue is a bug in strftime() and wcsftime() in Windows CRT. A solution is to replace "%Z" with the timezone name before calling strftime() or wcsftime(), aka working around the Windows CRT bug.
msg176408 - (view)	Author: Masami HIRATA (msmhrt)	Date: 2012年11月26日 12:14
Is there any progress on this issue?
msg224325 - (view)	Author: Mark Lawrence (BreamoreBoy) *	Date: 2014年07月30日 16:50
Could somebody respond to the originator please.
msg251013 - (view)	Author: Petr Prikryl (prikryl) *	Date: 2015年09月18日 17:25
I have just observed behaviour for the Czech locale. I tried to avoid collisions with stdout encoding, writing the strings into a file using UTF-8 encoding: tzname_bug.py -------------------------------------------------- #!python3 import time import sys with open('tzname_bug.txt', 'w', encoding='utf-8') as f: f.write(sys.version + '\n') f.write('Should be: Střední Evropa (běžný čas) \| Střední Evropa (letní čas)\n') f.write('but it is: ' + time.tzname[0] + ' \| ' + time.tzname[1] + '\n') f.write(' types: ' + repr(type(time.tzname[0])) + ' \| ' + repr(type(time.tzname[1])) + '\n') f.write('Should be as ascii: ' + ascii('Střední Evropa (běžný čas) \| Střední Evropa (letní čas)') + '\n') f.write('but it is as ascii: ' + ascii(time.tzname[0]) + ' \| ' + ascii(time.tzname[1]) + '\n') ----------------------------------- It creates the tzname_bug.txt with the content (copy/pasted from UNICODE-capable editor (Notepad++, the indicator at the right bottom corner shows UTF-8. ----------------------------------- 3.5.0 (v3.5.0:374f501f4567, Sep 13 2015, 02:27:37) [MSC v.1900 64 bit (AMD64)] Should be: Střední Evropa (běžný čas) \| Střední Evropa (letní čas) but it is: Støední Evropa (bìný èas) \| Støední Evropa (letní èas) types: <class 'str'> \| <class 'str'> Should be as ascii: 'St\u0159edn\xed Evropa (b\u011b\u017en\xfd \u010das) \| St\u0159edn\xed Evropa (letn\xed \u010das)' but it is as ascii: 'St\xf8edn\xed Evropa (b\xec\x9en\xfd \xe8as)' \| 'St\xf8edn\xed Evropa (letn\xed \xe8as)' -----------------------------------
msg251068 - (view)	Author: Eryk Sun (eryksun) * (Python triager)	Date: 2015年09月19日 09:34
To decode the tzname strings, Python calls mbstowcs, which on Windows uses Latin-1 in the "C" locale. However, in this locale the tzname strings are actually encoded using the system ANSI codepage (e.g. 1250 for Central/Eastern Europe). So it ends up decoding ANSI strings as Latin-1 mojibake. For example: >>> s 'Střední Evropa (běžný čas) \| Střední Evropa (letní čas)' >>> s.encode('1250').decode('latin-1') 'Støední Evropa (bì\x9ený èas) \| Støední Evropa (letní èas)' You can work around the inconsistency by calling setlocale(LC_ALL, "") before anything imports the time module. This should set a locale that's not "C", in which case the codepage should be consistent. Of course, this won't help if you can't control when the time module is first imported. The latter wouldn't be a issue if time.tzset were implemented on Windows. You can at least use ctypes to call the CRT's _tzset function. This solves the problem with time.strftime('%Z'). You can also get the CRT's tzname by calling the exported __tzname function. Here's a Python 3.5 example that sets the current thread to use Russian and creates a new tzname tuple: import ctypes import locale kernel32 = ctypes.WinDLL('kernel32') ucrtbase = ctypes.CDLL('ucrtbase') MUI_LANGUAGE_NAME = 8 kernel32.SetThreadPreferredUILanguages(MUI_LANGUAGE_NAME, 'ru-RU0円', None) locale.setlocale(locale.LC_ALL, 'ru-RU') # reset tzname in current locale ucrtbase._tzset() ucrtbase.__tzname.restype = ctypes.POINTER(ctypes.c_char_p * 2) c_tzname = ucrtbase.__tzname()[0] tzname = tuple(tz.decode('1251') for tz in c_tzname) # print Cyrillic characters to the console kernel32.SetConsoleOutputCP(1251) stdout = open(1, 'w', buffering=1, encoding='1251', closefd=0) >>> print(tzname, file=stdout) ('Время в формате UTC', 'Время в формате UTC')
msg251098 - (view)	Author: Petr Prikryl (prikryl) *	Date: 2015年09月19日 18:34
I have worked around a bit differently -- the snippet from the code: result = time.tzname[0] # simplified version of the original code. # Because of the bug in Windows libraries, Python 3.3 tried to work around # some issues. However, the shit hit the fan, and the bug bubbled here. # The `time.tzname` elements are (unicode) strings; however, they were # filled with bad content. See https://bugs.python.org/issue16322 for details. # Actually, wrong characters were passed instead of the good ones. # This code should be skipped later by versions of Python that will fix # the issue. import platform if platform.system() == 'Windows': # The concrete example for Czech locale: # - cp1250 (windows-1250) is used as native encoding # - the time.tzname[0] should start with 'Střední Evropa' # - the ascii('Střední Evropa') should return "'St\u0159edn\xed Evropa'" # - because of the bug it returns "'St\xf8edn\xed Evropa'" # # The 'ř' character has unicode code point `\u0159` (that is hex) # and the `\xF8` code in cp1250. The `\xF8` was wrongly used # as a Unicode code point `\u00F8` -- this is for the Unicode # character 'ø' that is observed in the string. # # To fix it, the `result` string must be reinterpreted with a different # encoding. When working with Python 3 strings, it can probably # done only through the string representation and `eval()`. Here # the `eval()` is not very dangerous because the string was obtained # from the OS library, and the values are limited to certain subset. # # The `ascii()` literal is prefixed by `binary` type prefix character, # `eval`uated, and the binary result is decoded to the correct string. local_encoding = locale.getdefaultlocale()[1] b = eval('b' + ascii(result)) result = b.decode(local_encoding)
msg251259 - (view)	Author: Eryk Sun (eryksun) * (Python triager)	Date: 2015年09月21日 20:49
> local_encoding = locale.getdefaultlocale()[1] Use locale.getpreferredencoding(). > b = eval('b' + ascii(result)) > result = b.decode(local_encoding) It's simpler and more reliable to use 'latin-1' and 'mbcs' (ANSI). For example: result = result.encode('latin-1').decode('mbcs') If setlocale(LC_CTYPE, "") is called before importing the time module, then tzname is already correct. In this case, the above is either harmless or raises a UnicodeEncodeError that can be handled. OTOH, your approach silently corrupts the value: >>> result = 'Střední Evropa (běžný čas)' >>> b = eval('b' + ascii(result)) >>> b.decode('1251') 'St\\u0159ednн Evropa (b\\u011b\\u017enэ \\u010das)' Back to the issue. In review, on initial import of the time module, if the CRT is using the default "C" locale, we have this inconsistency in which the time functions encode/decode tzname as ANSI and mbstowcs decodes tzname as Latin-1. (Plus strftime in the new CRT calls wcsftime, which adds another transcoding layer to compound the mojibake goodness.) If time.tzset is implemented on Windows, then at startup an application can set the locale (specifically LC_CTYPE for tzname, and LC_TIME for strftime) and then call time.tzset(). Example with Russian system locale: Initially we're in the "C" locale and the CRT's tzname is in ANSI. time.tzname incorrectly decodes this as Latin-1 since that's what mbstowcs uses in the "C" locale: >>> time.tzname[0] '\xc2\xf0\xe5\xec\xff \xe2 \xf4\xee\xf0\xec\xe0\xf2\xe5 UTC' The way the CRT's strftime is implemented compounds the problem: >>> time.strftime('%Z') 'A?aiy a oi?iaoa UTC' It's implemented by calling the wide-character function, wcsftime. Just like Python, this gets a wide-character string by calling mbstowcs on the ANSI tzname. Then the CRT's strftime encodes the wide-character string back as a best-fit ANSI string, and finally time.strftime decodes the result as Latin-1 via mbstowcs. The result is mutated mojibake: >>> time.tzname[0].encode('mbcs', 'replace').decode('latin-1') 'A?aiy a oi?iaoa UTC' Ironically, Python stopped calling wcsftime on Windows because of these problems, but changes to the code since then, plus the new CRT, have brought the problem back, and worse. See my comment in issue 10653, msg243660. Fix this by setting the locale and calling _tzset: >>> import ctypes, locale >>> locale.setlocale(locale.LC_ALL, '') 'Russian_Russia.1251' >>> ctypes.cdll.ucrtbase._tzset() 0 >>> time.strftime('%Z') 'Время в формате UTC' If time.tzset were implemented on Windows, calling it would reload the time.tzname tuple.
msg251264 - (view)	Author: Petr Prikryl (prikryl) *	Date: 2015年09月21日 21:43
@eryksun: I see. In my case, I can set the locale before importing the time module. However, the code (asciidoc3.py) will be used as a module, and I cannot know if the user imported the time module or not. Instead of your suggestion result = result.encode('latin-1').decode('mbcs') I was thinking to create a module say wordaround16322.py like this: --------------- import locale locale.setlocale(locale.LC_ALL, '') import importlib import time importlib.reload(time) --------------- I thought that reloading the time module would be the same as importing is later, after setting locale. If that worked, the module could be simply imported wherever it was needed. However, it does not work when imported after importing time. What is the reason? Does reload() work only for modules coded as Python sources? Is there any other approach that would implement the workaroundXXX.py module?
msg251289 - (view)	Author: Eryk Sun (eryksun) * (Python triager)	Date: 2015年09月22日 06:07
> import locale > locale.setlocale(locale.LC_ALL, '') > > import importlib > import time > importlib.reload(time) > > it does not work when imported after importing time. > What is the reason? Does reload() work only for > modules coded as Python sources? The import system won't reinitialize a builtin or dynamic extension module. Reloading just returns a reference to the existing module. It won't even reload a PEP 489 multi-phase extension module. (But you can create and exec a new instance of a multi-phase extension module.) > Is there any other approach that would implement the > workaroundXXX.py module? If the user's default locale and the current thread's preferred language are compatible with the system ANSI encoding [1], then you don't actually need to call _tzset nor worry about time.tzname. Call setlocale(LC_CTYPE, ''), and then call time.strftime('%Z') to get the timezone name. If you use Win32 directly instead of the CRT, then none of this ANSI business is an issue. Just call GetTimeZoneInformation to get the standard and daylight names as wide-character strings. You have that option via ctypes. [1]: A user can select a default locale (language) that's unrelated to the system ANSI locale (the ANSI setting is per machine, located under Region->Administrative). Also, the preferred language can be selected dynamically by calling SetThreadPreferredUILanguages or SetProcessPreferredUILanguages. All three could be incompatible with each other, in which case you have to explicitly set the locale (e.g. "ru-RU" instead of an empty string) and call _tzset.
msg251308 - (view)	Author: Petr Prikryl (prikryl) *	Date: 2015年09月22日 11:30
@eryksun: Thanks for your help. I have finaly ended with your... "Call setlocale(LC_CTYPE, ''), and then call time.strftime('%Z') to get the timezone name."
msg302936 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2017年09月25日 09:36
bpo-31549 has been marked as a duplicate of this issue.
msg302937 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2017年09月25日 09:37
Formatting timezone on Windows in the right encoding is an old Python (especially Python 3) issue: https://bugs.python.org/issue1040 https://bugs.python.org/issue8304 https://bugs.python.org/issue10653 https://bugs.python.org/issue16322#msg174164
msg388209 - (view)	Author: Eryk Sun (eryksun) * (Python triager)	Date: 2021年03月06日 16:21
The solution for bpo-36779 changed init_timezone() to get tzname directly from WinAPI GetTimeZoneInformation(). Unfortunately the implementer didn't think to also support time.tzset(), so the value may be stale with no way to refresh it, or possibly different from what time.strftime('%Z') returns, depending on when ucrt looks up the timezone. For example, start Python and import time. Then change the time zone and call time.strftime('%Z'). The value will be different from time.tzname.

History
Date	User	Action	Args
2022年04月11日 14:57:37	admin	set	github: 60526
2021年03月06日 16:21:56	eryksun	set	status: open -> closed superseder: time.tzname returns empty string on Windows if default codepage is a Unicode codepage messages: + msg388209 resolution: duplicate stage: patch review -> resolved
2018年07月05日 15:47:37	p-ganssle	set	nosy: + p-ganssle
2017年09月25日 15:53:06	BreamoreBoy	set	nosy: - BreamoreBoy
2017年09月25日 09:37:31	vstinner	set	messages: + msg302937
2017年09月25日 09:36:47	vstinner	set	messages: + msg302936
2017年09月25日 08:01:52	serhiy.storchaka	link	issue31549 superseder
2017年09月25日 07:07:34	denis-osipov	set	keywords: + patch stage: patch review pull_requests: + pull_request3726
2015年09月22日 11:30:54	prikryl	set	messages: + msg251308
2015年09月22日 06:07:23	eryksun	set	messages: + msg251289
2015年09月21日 21:43:01	prikryl	set	messages: + msg251264
2015年09月21日 20:49:45	eryksun	set	messages: + msg251259 versions: + Python 3.6
2015年09月19日 18:34:55	prikryl	set	messages: + msg251098
2015年09月19日 09:34:48	eryksun	set	nosy: + eryksun messages: + msg251068
2015年09月18日 17:25:39	prikryl	set	files: + tzname_bug.py nosy: + prikryl messages: + msg251013
2014年07月30日 16:50:45	BreamoreBoy	set	nosy: + BreamoreBoy messages: + msg224325 versions: + Python 3.5, - Python 3.3
2012年11月26日 12:14:57	msmhrt	set	messages: + msg176408
2012年10月30日 10:03:40	serhiy.storchaka	set	nosy: - serhiy.storchaka
2012年10月30日 00:28:26	vstinner	set	nosy: + ocean-city messages: + msg174165
2012年10月29日 23:59:23	vstinner	set	messages: + msg174164
2012年10月29日 23:19:11	vstinner	set	messages: + msg174161
2012年10月28日 03:55:38	jcea	set	nosy: + jcea
2012年10月26日 09:08:31	msmhrt	set	messages: + msg173827
2012年10月26日 07:07:48	vstinner	set	messages: + msg173824
2012年10月25日 22:55:46	msmhrt	set	messages: + msg173806
2012年10月25日 20:30:20	vstinner	set	messages: + msg173798
2012年10月25日 19:12:46	r.david.murray	set	nosy: + belopolsky, vstinner
2012年10月25日 18:06:07	serhiy.storchaka	set	messages: + msg173784
2012年10月25日 17:31:57	amaury.forgeotdarc	set	nosy: + amaury.forgeotdarc messages: + msg173772
2012年10月25日 13:47:12	serhiy.storchaka	set	versions: + Python 3.4 nosy: + serhiy.storchaka messages: + msg173758 components: + Extension Modules keywords: + 3.3regression
2012年10月25日 11:56:50	msmhrt	create

homepage