This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2009年03月25日 19:46 by pitrou, last changed 2022年04月11日 14:56 by admin. This issue is now closed.
| Messages (4) | |||
|---|---|---|---|
| msg84163 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2009年03月25日 19:46 | |
Locale-based date formatting in py3k (using strftime) crashes when asked
to format a month name (or day, I assume) containing non-ASCII characters:
>>> import time
>>> import locale
>>> time.strftime("%B", (2009,2,1,0,0,0,0,0,0))
'February'
>>> locale.setlocale(locale.LC_TIME, "fr_FR")
'fr_FR'
>>> time.strftime("%B", (2009,2,1,0,0,0,0,0,0))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1-3:
invalid data
It works if I specify the encoding explicitly in the locale name so as
to coincide with the encoding specified in the error message above (but
that's assuming the given encoding-specific locale *is* installed):
>>> locale.setlocale(locale.LC_TIME, "fr_FR.UTF-8")
'fr_FR.UTF-8'
>>> time.strftime("%B", (2009,2,1,0,0,0,0,0,0))
'février'
|
|||
| msg84164 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2009年03月25日 19:48 | |
(if I explicitly set another encoding, it doesn't work however:
>>> locale.setlocale(locale.LC_TIME, "fr_FR.ISO-8859-1")
'fr_FR.ISO-8859-1'
>>> time.strftime("%B", (2009,2,1,0,0,0,0,0,0))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1-3:
invalid data
)
|
|||
| msg84170 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2009年03月26日 00:43 | |
I think the problem is that creation of the Unicode string defaults to UTF-8. It should instead use the locale's encoding. You are right that it could be an issue that there is no Python codec for the locale's encoding. To be robust against this case, I think the locale's mbcs->wcs routines should be used (i.e. mbstowcs). Better yet, use wcsftime in the first place. AFAICT, wcsftime is C99, so not all systems might support it. However, it appears that MSVC has it, so we could assume it exists and wait until someone complains. One issue apparently is that some implementations of wcsftime expect the format as char* (and again, I would defer dealing with that until somebody complains). In either case, you end up with a wchar_t. In principle, the locale might use a non-Unicode wide charset for wchar_t, but these got out of use some time ago, and Python had always assumed that wchar_t is Unicode. |
|||
| msg88516 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2009年05月29日 16:37 | |
This is a duplicate of issue 3061. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:56:46 | admin | set | github: 49812 |
| 2009年05月29日 16:37:51 | loewis | set | status: open -> closed resolution: duplicate superseder: time.strftime() always decodes result with UTF-8 messages: + msg88516 |
| 2009年03月26日 00:43:22 | loewis | set | nosy:
+ loewis messages: + msg84170 |
| 2009年03月25日 19:48:17 | pitrou | set | messages: + msg84164 |
| 2009年03月25日 19:46:17 | pitrou | create | |