homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: strptime fails in non-UTF locale
Type: behavior Stage: test needed
Components: Library (Lib), Unicode Versions: Python 3.2, Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: BreamoreBoy, ezio.melotti, lemburg, pitrou, python-dev, terry.reedy, vstinner
Priority: critical Keywords: patch

Created on 2009年05月02日 15:17 by pitrou, last changed 2022年04月11日 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
tzname_encoding.patch vstinner, 2011年12月08日 12:45 review
Messages (10)
msg86953 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009年05月02日 15:17
time.strptime() fails with non-UTF8 locales, *even when the input is
totally ASCII*.
>>> locale.setlocale(locale.LC_TIME, "fr_FR.ISO8859-15")
'fr_FR.ISO8859-15'
>>> time.strptime("2009-01-01", "%Y-%m-%d")
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "/home/antoine/py3k/__svn__/Lib/_strptime.py", line 461, in
_strptime_time
 return _strptime(data_string, format)[0]
 File "/home/antoine/py3k/__svn__/Lib/_strptime.py", line 307, in _strptime
 _TimeRE_cache = TimeRE()
 File "/home/antoine/py3k/__svn__/Lib/_strptime.py", line 188, in __init__
 self.locale_time = LocaleTime()
 File "/home/antoine/py3k/__svn__/Lib/_strptime.py", line 72, in __init__
 self.__calc_month()
 File "/home/antoine/py3k/__svn__/Lib/_strptime.py", line 98, in
__calc_month
 a_month = [calendar.month_abbr[i].lower() for i in range(13)]
 File "/home/antoine/py3k/__svn__/Lib/_strptime.py", line 98, in <listcomp>
 a_month = [calendar.month_abbr[i].lower() for i in range(13)]
 File "/home/antoine/py3k/__svn__/Lib/calendar.py", line 60, in __getitem__
 return funcs(self.format)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1-3:
invalid data
msg97808 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010年01月15日 12:57
The reason for this is that the strftime() C lib API is used to build localized month names. With your setting, you'll get French Latin-1 month names and those cannot be coerced to UTF-8 due to the accented characters in them.
This works in Python 2.x since PyUnicode_FromString() et al. convert Latin-1 strings to Unicode.
Apparently, this was changed in Python 3.x without looking at the header file or looking at the Python 2.x implementation which mandate Latin-1 as input encoding. Even the Python 3.x header still says that PyUnicode_FromString() will convert from Latin-1 to Unicode.
No idea why time.strptime() even bothers with these month names, though, since neither the format string nor the string being parsed contains literal month names.
msg98105 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010年01月21日 11:30
I'm unable to reproduce the error. I tried locales fr_FR.iso88591 and fr_FR.iso885915@euro (fr_FR@euro), but the example works correctly. Should the terminal use the specified locale? My terminal uses fr_FR.utF8 locale. Should set_locale() be called before loaded time and/or calendar module?
msg112997 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2010年08月05日 16:33
I can't reproduce this on Windows Vista with 3.1 or 3.2 despite trying several Western & Eastern European, Chinese & Japanese locales.
msg113736 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010年08月13日 01:32
> I can't reproduce this on Windows ...
This issue is (was?) maybe specific to Linux.
msg138210 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011年06月12日 18:33
Still a problem in 3.2.1 or 3.3?
msg138722 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年06月20日 14:19
I close the issue because I am unable to reproduce it.
msg149024 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年12月08日 12:45
Oh! I think that I understood the problem: if HAVE_WCSFTIME is not defined, timemodule.c uses strftime(), instead of wcsftime(), encode input format and decode the format. It uses UTF-8 to encode/decode, whereas the right encoding is the locale encoding. Attached patch should fix this issue.
@Antoine: Do you have any idea why HAVE_WCSFTIME was not defined?
wcsftime() is defined in <wchar.h> on Ubuntu. In configure, it is tested using AC_CHECK_FUNCS(wcsftime)
msg149038 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011年12月08日 14:42
Well, it seems defined here:
$ grep HAVE_WCSFTIME pyconfig.h
1071:#define HAVE_WCSFTIME 1
> Attached patch should fix this issue.
I'm sorry, I can't test the patch, because my Linux distro (Mageia) doesn't have the "fr_FR.ISO8859-15" locale anymore :-(
msg149115 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011年12月09日 19:19
New changeset 8620e6901e58 by Victor Stinner in branch '3.2':
Issue #5905: time.strftime() is now using the locale encoding, instead of
http://hg.python.org/cpython/rev/8620e6901e58
New changeset bee7694988a4 by Victor Stinner in branch 'default':
(Merge 3.2) Issue #5905: time.strftime() is now using the locale encoding,
http://hg.python.org/cpython/rev/bee7694988a4 
History
Date User Action Args
2022年04月11日 14:56:48adminsetgithub: 50155
2011年12月09日 19:19:41vstinnersetstatus: open -> closed
resolution: fixed
2011年12月09日 19:19:18python-devsetnosy: + python-dev
messages: + msg149115
2011年12月08日 14:42:16pitrousetmessages: + msg149038
2011年12月08日 12:45:15vstinnersetstatus: closed -> open
files: + tzname_encoding.patch
messages: + msg149024

components: + Unicode
keywords: + patch
resolution: not a bug -> (no value)
2011年06月20日 14:19:28vstinnersetstatus: open -> closed
resolution: not a bug
messages: + msg138722
2011年06月12日 18:33:45terry.reedysetnosy: + terry.reedy

messages: + msg138210
versions: + Python 3.2, Python 3.3, - Python 3.1
2010年08月13日 01:32:20vstinnersetmessages: + msg113736
2010年08月05日 16:33:11BreamoreBoysetnosy: + BreamoreBoy
messages: + msg112997
2010年01月21日 11:30:27vstinnersetnosy: + vstinner
messages: + msg98105
2010年01月15日 12:57:21lemburgsetmessages: + msg97808
2010年01月15日 12:15:17ezio.melottisetnosy: + lemburg, ezio.melotti

versions: - Python 3.0
2009年05月02日 15:17:20pitroucreate

AltStyle によって変換されたページ (->オリジナル) /