This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2012年09月16日 11:55 by Aleksey.Sivokon, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| issue-15951-test-1.patch | chris.jerdonek, 2012年09月16日 13:17 | |||
| issue-15951-2-branch27.patch | chris.jerdonek, 2012年09月19日 00:38 | |||
| Messages (9) | |||
|---|---|---|---|
| msg170551 - (view) | Author: Aleksey Sivokon (Aleksey.Sivokon) | Date: 2012年09月16日 11:55 | |
Expected behavior of string.Formatter() is to return unicode strings for unicode templates, and "byte" strings for str templates. Which is exactly what it does, with one frustrating exception: for empty unicode string it returns byte str. Test follows: import string template = u"" result = string.Formatter().format(template) assert isinstance(result, unicode) # AssertionError |
|||
| msg170552 - (view) | Author: Chris Jerdonek (chris.jerdonek) * (Python committer) | Date: 2012年09月16日 13:17 | |
Adding failing test. Patch coming next. |
|||
| msg170555 - (view) | Author: Chris Jerdonek (chris.jerdonek) * (Python committer) | Date: 2012年09月16日 13:55 | |
Here are some related failing cases that I found:
>>> f = string.Formatter()
>>> f.format(u"{0}", "")
''
>>> f.format(u"{0}", 1)
'1'
>>> f.format(u"{0}", "a")
'a'
>>> f.format(u"{0}{1}", "a", "b")
'ab'
>>> f.format("{0}", u"a")
u'a'
Note that PEP 3101 says the following:
"In all cases, the type of the format string dominates - that
is, the result of the conversion will always result in an object
that contains the same representation of characters as the
input format string."
|
|||
| msg170559 - (view) | Author: Chris Jerdonek (chris.jerdonek) * (Python committer) | Date: 2012年09月16日 14:32 | |
Actually, I'm going to defer on creating a patch because this covers more scenarios than I originally thought and so may require more time. |
|||
| msg170560 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2012年09月16日 14:55 | |
Format with unicode is a bit of a mess in 2.7. It would be consistent with the rest of python2 for
>>> f.format("{0}", u"a")
u'a'
to be correct.
See also issue 7300 and issue 15276.
|
|||
| msg170562 - (view) | Author: Chris Jerdonek (chris.jerdonek) * (Python committer) | Date: 2012年09月16日 18:27 | |
What about cases like this?
>>> f.format(u'{0}', '\xe9')
'\xe9'
It seems fixing this issue for non-empty strings would cause formerly running cases like this to raise UnicodeDecodeError.
>>> unicode('\xe9')
...
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 0: ordinal not in range(128)
Would that be acceptable?
|
|||
| msg170571 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2012年09月16日 19:13 | |
Note that I didn't say it was correct, I just said it was consistent :) And no, breaking stuff that current works is a non-starter for 2.7. |
|||
| msg170576 - (view) | Author: Chris Jerdonek (chris.jerdonek) * (Python committer) | Date: 2012年09月16日 20:09 | |
I filed issue 15952 for the behavior difference between format(value) and value.__format__() and the related lack of documentation re: unicode format strings. Given that the expected behavior for the current issue doesn't seem to be documented (aside from PEP 3101, which is probably too late to follow), we should probably agree on what the behavior should be (as well as documenting it) before or while addressing this issue. |
|||
| msg170693 - (view) | Author: Chris Jerdonek (chris.jerdonek) * (Python committer) | Date: 2012年09月19日 00:38 | |
Attached is a proposed patch.
Some explanation behind the patch that stems from the above comments:
The following is an example of Formatter.format() returning str in the current implementation that would break if we made Formatter.format() return unicode whenever format_string is unicode:
>>> f.format(u"{0}", "\xc3\xa9") # UTF-8 encoded "e-acute".
'\xc3\xa9'
(It would break with a UnicodeDecodeError because 'ascii' is the default encoding.)
Since we can't change Formatter.format(format_string) to return unicode whenever format_string is unicode without breaking existing code, I believe the best we can do is to document the departure from PEP 3101. Since the caller has to handle return values of type str anyways, I don't think it helps to ensure that more return values are unicode.
|
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:36 | admin | set | github: 60155 |
| 2020年05月31日 12:11:58 | serhiy.storchaka | set | status: open -> closed resolution: out of date stage: needs patch -> resolved |
| 2012年09月22日 14:07:58 | chris.jerdonek | set | nosy:
+ ezio.melotti |
| 2012年09月19日 00:38:54 | chris.jerdonek | set | files:
+ issue-15951-2-branch27.patch messages: + msg170693 |
| 2012年09月16日 20:09:41 | chris.jerdonek | set | messages: + msg170576 |
| 2012年09月16日 19:13:08 | r.david.murray | set | messages: + msg170571 |
| 2012年09月16日 18:27:09 | chris.jerdonek | set | messages: + msg170562 |
| 2012年09月16日 14:55:13 | r.david.murray | set | nosy:
+ eric.smith, r.david.murray messages: + msg170560 |
| 2012年09月16日 14:32:40 | chris.jerdonek | set | messages: + msg170559 |
| 2012年09月16日 13:55:30 | chris.jerdonek | set | messages: + msg170555 |
| 2012年09月16日 13:17:04 | chris.jerdonek | set | files:
+ issue-15951-test-1.patch nosy: + chris.jerdonek messages: + msg170552 keywords: + patch stage: needs patch |
| 2012年09月16日 11:55:56 | Aleksey.Sivokon | create | |