This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2009年02月13日 01:46 by ezio.melotti, last changed 2022年04月11日 14:56 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| remove_ascii_flag.patch | ocean-city, 2009年02月13日 16:30 | |||
| Messages (13) | |||
|---|---|---|---|
| msg81847 - (view) | Author: Ezio Melotti (ezio.melotti) * (Python committer) | Date: 2009年02月13日 01:46 | |
On Py3 strptime("2009", "%Y") fails:
>>> strptime("2009", "%Y")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.0/_strptime.py", line 454, in _strptime_time
return _strptime(data_string, format)[0]
File "/usr/local/lib/python3.0/_strptime.py", line 325, in _strptime
(data_string, format))
ValueError: time data '2009' does not match format '%Y'
but non-ascii numbers are supported elsewhere:
>>> int("2009")
2009
>>> re.match("^\d{4}$", "2009").group()
'2009'
The problem seems to be at the line 265 of _strptime.py:
return re_compile(self.pattern(format), IGNORECASE | ASCII)
The ASCII flag prevent the regex to work properly with '2009':
>>> re.match("^\d{4}$", "2009", re.ASCII)
>>>
I tried to remove the ASCII flag and it worked fine.
On Py2.x the problem is the same:
>>> strptime(u"2009", "%Y")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.5/_strptime.py", line 330, in strptime
(data_string, format))
ValueError>>>
>>> int(u"2009")
2009
>>> re.match("^\d{4}$", u"2009")
Here there's probably to add the re.UNICODE flag at the line 265 (untested):
return re_compile(self.pattern(format), IGNORECASE | UNICODE)
in order to make it work:
>>> re.match("^\d{4}$", u"2009", re.U).group()
u'\uff12\uff10\uff10\uff19'
|
|||
| msg81928 - (view) | Author: Hirokazu Yamamoto (ocean-city) * (Python committer) | Date: 2009年02月13日 13:52 | |
This patch comes from issue5240. I think testcase is needed. I'll try if I can. |
|||
| msg81932 - (view) | Author: Hirokazu Yamamoto (ocean-city) * (Python committer) | Date: 2009年02月13日 14:13 | |
Hmm, this fails on python2 too. Maybe re.ASCII is added for backward compatibility? Again, I'm not familiar with unicode, so I won't call remove_ascii_flag.patch as *fix*. |
|||
| msg81934 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2009年02月13日 14:24 | |
> Hmm, this fails on python2 too. Maybe re.ASCII is added for backward > compatibility? Again, I'm not familiar with unicode, so I won't call > remove_ascii_flag.patch as *fix*. re.ASCII was added to many stdlib modules because I wanted to minimize the potential for breakage when I converted the re library to use unicode matching by default. If it is desireable for strptime() and friends to match unicode digits as well as pure-ASCII digits (which sounds like a reasonable request to me), then re.ASCII can probably be dropped without any regret. (py3k doesn't have to be 100% compatible with python2 :-)) |
|||
| msg81938 - (view) | Author: Ezio Melotti (ezio.melotti) * (Python committer) | Date: 2009年02月13日 14:44 | |
I think Py3 with re.ASCII is the same as Py2 without re.UNICODE (and Py3 without re.ASCII is the same as Py2 with re.UNICODE). It's probably a good idea to have a coherent behavior between Py2 and Py3, so if we remove re.ASCII from Py3 we should add re.UNICODE to Py2. |
|||
| msg81939 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2009年02月13日 14:50 | |
Le vendredi 13 février 2009 à 14:44 +0000, Ezio Melotti a écrit : > It's probably a good idea to have a coherent behavior between Py2 and > Py3, so if we remove re.ASCII from Py3 we should add re.UNICODE to Py2. Removing re.ASCII in py3k is a no-brainer, because unicode is how strings work by default. On the other hand, strings in 2.x are 8-bit, so it would probably be better to keep strptime as is. As I said, py3k doesn't have to be compatible with 2.x, that's even the whole point of it. |
|||
| msg81940 - (view) | Author: Ezio Melotti (ezio.melotti) * (Python committer) | Date: 2009年02月13日 15:27 | |
> Removing re.ASCII in py3k is a no-brainer, because unicode is how > strings work by default. I meant from the line 265 of _strptime.py, not from Python :P |
|||
| msg81941 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2009年02月13日 15:30 | |
> > Removing re.ASCII in py3k is a no-brainer, because unicode is how > > strings work by default. > > I meant from the line 265 of _strptime.py, not from Python :P That's what I understood. |
|||
| msg81948 - (view) | Author: Ezio Melotti (ezio.melotti) * (Python committer) | Date: 2009年02月13日 16:26 | |
Sorry, I misunderstood the meaning of "no-brainer". If we add re.UNICODE on Py2, strptime should work fine with unicode strings, but it could fail somehow with normal strings. Is it more important to provide a way to use Unicode chars that works only with unicode strings or to have a coherent behavior between str and unicode? I don't think that adding re.UNICODE will break any existing code, but it may cause problems if someone tries to use encoded str instead of unicode (but shouldn't work already). Also note that encoded strings should be a problem only if they have to match a strptime directive (e.g. %Y), the other chars should be compared as they are, so it should work with str and unicode as long as they are not mixed (I think that whitespaces are treated differently though). I'll try to add re.UNICODE and see what happens. |
|||
| msg81949 - (view) | Author: Hirokazu Yamamoto (ocean-city) * (Python committer) | Date: 2009年02月13日 16:30 | |
I added test. But this requires issue5249 fix to be passed on windows. (I used "\u3000" instead of "\xa0" because "\xa0" cannot be decoded on windows mbcs) |
|||
| msg81952 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2009年02月13日 16:57 | |
> If we add re.UNICODE on Py2, strptime should work fine with unicode > strings, but it could fail somehow with normal strings. Is it more > important to provide a way to use Unicode chars that works only with > unicode strings or to have a coherent behavior between str and unicode? I'd say the latter, since str and unicode are often interchangeable in 2.x. |
|||
| msg84665 - (view) | Author: Hirokazu Yamamoto (ocean-city) * (Python committer) | Date: 2009年03月30日 21:52 | |
This issue seems to be fixed on py3k by r70755. (issue5236) |
|||
| msg84669 - (view) | Author: Brett Cannon (brett.cannon) * (Python committer) | Date: 2009年03月30日 21:54 | |
As Hirokazu pointed out, this was fixed. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:56:45 | admin | set | github: 49489 |
| 2009年03月30日 21:54:24 | brett.cannon | set | status: open -> closed resolution: fixed messages: + msg84669 |
| 2009年03月30日 21:52:24 | ocean-city | set | messages: + msg84665 |
| 2009年02月13日 19:06:30 | brett.cannon | set | nosy: + brett.cannon |
| 2009年02月13日 16:57:15 | pitrou | set | messages: + msg81952 |
| 2009年02月13日 16:30:50 | ocean-city | set | files: - remove_ascii_flag.patch |
| 2009年02月13日 16:30:39 | ocean-city | set | files:
+ remove_ascii_flag.patch dependencies: + Fix strftime on windows. messages: + msg81949 |
| 2009年02月13日 16:26:07 | ezio.melotti | set | messages: + msg81948 |
| 2009年02月13日 15:30:08 | pitrou | set | messages: + msg81941 |
| 2009年02月13日 15:27:52 | ezio.melotti | set | messages: + msg81940 |
| 2009年02月13日 14:50:33 | pitrou | set | messages: + msg81939 |
| 2009年02月13日 14:44:18 | ezio.melotti | set | messages: + msg81938 |
| 2009年02月13日 14:24:37 | pitrou | set | messages: + msg81934 |
| 2009年02月13日 14:13:06 | ocean-city | set | nosy:
+ pitrou messages: + msg81932 |
| 2009年02月13日 14:01:30 | ezio.melotti | set | title: time.strptime("2009", "%Y") raises a value error -> Change time.strptime() to make it work with Unicode chars |
| 2009年02月13日 13:52:31 | ocean-city | set | files:
+ remove_ascii_flag.patch keywords: + patch dependencies: + re.IGNORECASE not Unicode-ready messages: + msg81928 nosy: + ocean-city |
| 2009年02月13日 13:43:17 | ocean-city | link | issue5240 superseder |
| 2009年02月13日 01:47:01 | ezio.melotti | set | type: behavior components: + Library (Lib), Unicode versions: + Python 2.6, Python 2.5, Python 2.4, Python 3.0, Python 3.1, Python 2.7 |
| 2009年02月13日 01:46:34 | ezio.melotti | create | |