Issue 5562: Locale-based date formatting crashes on non-ASCII data

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/49812

classification

Title:	Locale-based date formatting crashes on non-ASCII data
Type:	behavior	Stage:
Components:	Library (Lib)	Versions:	Python 3.0, Python 3.1

process

Status:	closed	Resolution:	duplicate
Dependencies:	Superseder:	time.strftime() always decodes result with UTF-8 View: 3061
Assigned To:	Nosy List:	loewis, pitrou
Priority:	high	Keywords:

Created on 2009年03月25日 19:46 by pitrou, last changed 2022年04月11日 14:56 by admin. This issue is now closed.

Messages (4)
msg84163 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2009年03月25日 19:46
Locale-based date formatting in py3k (using strftime) crashes when asked to format a month name (or day, I assume) containing non-ASCII characters: >>> import time >>> import locale >>> time.strftime("%B", (2009,2,1,0,0,0,0,0,0)) 'February' >>> locale.setlocale(locale.LC_TIME, "fr_FR") 'fr_FR' >>> time.strftime("%B", (2009,2,1,0,0,0,0,0,0)) Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1-3: invalid data It works if I specify the encoding explicitly in the locale name so as to coincide with the encoding specified in the error message above (but that's assuming the given encoding-specific locale is installed): >>> locale.setlocale(locale.LC_TIME, "fr_FR.UTF-8") 'fr_FR.UTF-8' >>> time.strftime("%B", (2009,2,1,0,0,0,0,0,0)) 'février'
msg84164 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2009年03月25日 19:48
(if I explicitly set another encoding, it doesn't work however: >>> locale.setlocale(locale.LC_TIME, "fr_FR.ISO-8859-1") 'fr_FR.ISO-8859-1' >>> time.strftime("%B", (2009,2,1,0,0,0,0,0,0)) Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1-3: invalid data )
msg84170 - (view)	Author: Martin v. Löwis (loewis) * (Python committer)	Date: 2009年03月26日 00:43
I think the problem is that creation of the Unicode string defaults to UTF-8. It should instead use the locale's encoding. You are right that it could be an issue that there is no Python codec for the locale's encoding. To be robust against this case, I think the locale's mbcs->wcs routines should be used (i.e. mbstowcs). Better yet, use wcsftime in the first place. AFAICT, wcsftime is C99, so not all systems might support it. However, it appears that MSVC has it, so we could assume it exists and wait until someone complains. One issue apparently is that some implementations of wcsftime expect the format as char* (and again, I would defer dealing with that until somebody complains). In either case, you end up with a wchar_t. In principle, the locale might use a non-Unicode wide charset for wchar_t, but these got out of use some time ago, and Python had always assumed that wchar_t is Unicode.
msg88516 - (view)	Author: Martin v. Löwis (loewis) * (Python committer)	Date: 2009年05月29日 16:37
This is a duplicate of issue 3061.

History
Date	User	Action	Args
2022年04月11日 14:56:46	admin	set	github: 49812
2009年05月29日 16:37:51	loewis	set	status: open -> closed resolution: duplicate superseder: time.strftime() always decodes result with UTF-8 messages: + msg88516
2009年03月26日 00:43:22	loewis	set	nosy: + loewis messages: + msg84170
2009年03月25日 19:48:17	pitrou	set	messages: + msg84164
2009年03月25日 19:46:17	pitrou	create

homepage