This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2009年11月10日 13:57 by doerwalter, last changed 2022年04月11日 14:56 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| issue7300-trunk.patch | vstinner, 2010年03月09日 23:44 | review | ||
| Messages (10) | |||
|---|---|---|---|
| msg95114 - (view) | Author: Walter Dörwald (doerwalter) * (Python committer) | Date: 2009年11月10日 13:57 | |
str.format() doesn't handle unicode arguments:
Python 2.6.4 (r264:75706, Oct 27 2009, 15:18:04)
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> '{0}'.format(u'\u3042')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u3042' in
position 0: ordinal not in range(128)
Unicode arguments should be treated in the same way as the % operator
does it: by promoting the format string to unicode:
>>> '%s' % u'\u3042'
u'\u3042'
|
|||
| msg100769 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2010年03月09日 22:57 | |
PyString_Format() uses a "goto unicode;" if a '%c' or '%s' argument is unicode. The unicode label converts the partial formatted result (byte string) to unicode, and use PyUnicode_Format() to finish to formatting.
I don't think that you can apply the same algorithm here (converts the partial result to unicode) because it requires to rewrite the format string: arguments can be used twice or more, and used in any order.
Example: "{0} {1}".format("bytes", u"unicode") => switch to unicode occurs at result="bytes ", format=" {1}", arguments=(u"unicode"). Converts "bytes " to unicode is easy, but the format have to be rewritten in " {0}" or something else.
Call trace of str.format(): do_string_format() -> build_string() -> output_markup() -> render_field(). The argument type is proceed in render_field().
|
|||
| msg100770 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2010年03月09日 23:44 | |
*Draft* patch fixing the issue: render_field() raises an error if the argument is an unicode argument, string_format() catchs this error and converts self to unicode and call unicode.format(*args, **kw).
Pseudo-code:
try:
# self.format() raises an error if any argument is
# an unicode string)
return self.format(*args,**kw)
except UnicodeError:
unicode = self.decode(default_encoding)
return unicode.format(*args, **kw)
The patch changes the result type of '{}'.format(u'ascii'): it was str and it becomes unicode. The new behaviour is consistent with "%s" % u"ascii" => u"ascii" (unicode).
I'm not sure that catching *any* unicode error is a good idea. I think that it would be better to use a new exception type dedicated to this issue, but it looks complex to define a new exception. I will may do it for the next patch version ;-)
|
|||
| msg100771 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2010年03月09日 23:50 | |
My patch converts the format string to unicode using the default encoding. It's inconsistent with str%args: str%args converts str to unicode using the ASCII charset (if a least one argument is an unicode string), not the default encoding. >>> "\xff%s" % u'\xe9' ... UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: ordinal not in range(128) |
|||
| msg100861 - (view) | Author: Eric V. Smith (eric.smith) * (Python committer) | Date: 2010年03月11日 14:52 | |
I'm not sure I'm wild about doing the work twice, once as string and once as unicode if need be. But I'll consider it, especially since this is only a 2.7 issue. There could be side effects of evaluating the replacement strings, but I'm not sure it's worth worrying about. Attribute (or index) access having side effects isn't something I think we need to cater to. |
|||
| msg178596 - (view) | Author: Pedro Algarvio (s0undt3ch) * | Date: 2012年12月30日 18:35 | |
This is not a 2.7 issue only:
>>> import sys
>>> sys.version_info
(2, 6, 5, 'final', 0
>>> 'Foo {0}'.format(u'bár')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 1: ordinal not in range(128)
>>>
|
|||
| msg178599 - (view) | Author: Ezio Melotti (ezio.melotti) * (Python committer) | Date: 2012年12月30日 18:45 | |
2.6 only gets security fixes. > My patch converts the format string to unicode using the default > encoding. It's inconsistent with str%args: str%args converts str to > unicode using the ASCII charset (if a least one argument is an unicode > string), not the default encoding. I think it's better to be consistent and use ASCII. |
|||
| msg178617 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2012年12月30日 21:49 | |
Another option is to decide that this issue will *not* be fixed in Python 2, and Python 3 *is* the good solution if you have this issue.
Doing the work twice can cause new problems, formatting an argument twice may return two different values :-( It may have an impact on performances and may introduce regressions.
Oh by the way, it's trivial to workaround this issue in Python 2: just use a Unicode format string. For example, replace '{0}'.format(u'\u3042') with u'{0}'.format(u'\u3042').
I hate implicit conversion from bytes to Unicode in Python 2, it's maybe better to not add a new special case?
|
|||
| msg178618 - (view) | Author: Eric V. Smith (eric.smith) * (Python committer) | Date: 2012年12月30日 21:52 | |
I agree that we should close this as "won't fix" in 2.7. |
|||
| msg185426 - (view) | Author: Georg Brandl (georg.brandl) * (Python committer) | Date: 2013年03月28日 10:33 | |
Agreed with Eric. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:56:54 | admin | set | github: 51549 |
| 2013年03月28日 10:33:27 | georg.brandl | set | status: open -> closed nosy: + georg.brandl messages: + msg185426 resolution: wont fix |
| 2012年12月30日 21:52:06 | eric.smith | set | messages: + msg178618 |
| 2012年12月30日 21:49:10 | vstinner | set | messages: + msg178617 |
| 2012年12月30日 18:45:16 | ezio.melotti | set | messages: + msg178599 |
| 2012年12月30日 18:35:48 | s0undt3ch | set | nosy:
+ s0undt3ch messages: + msg178596 |
| 2012年09月26日 18:50:11 | ezio.melotti | set | nosy:
+ chris.jerdonek versions: - Python 2.6 |
| 2012年06月11日 11:40:34 | gkcn | set | nosy:
+ gkcn |
| 2010年09月09日 19:04:19 | flox | set | nosy:
+ flox |
| 2010年03月11日 14:52:28 | eric.smith | set | messages: + msg100861 |
| 2010年03月09日 23:50:03 | vstinner | set | messages: + msg100771 |
| 2010年03月09日 23:44:34 | vstinner | set | files:
+ issue7300-trunk.patch keywords: + patch messages: + msg100770 |
| 2010年03月09日 22:57:46 | vstinner | set | messages: + msg100769 |
| 2010年03月08日 01:18:41 | pablomouzo | set | nosy:
+ pablomouzo |
| 2010年03月07日 21:32:49 | vstinner | set | nosy:
+ vstinner |
| 2010年01月14日 00:12:14 | ezio.melotti | set | nosy:
+ ezio.melotti |
| 2009年11月14日 02:09:07 | ezio.melotti | set | priority: high stage: test needed versions: + Python 2.7 |
| 2009年11月10日 13:57:33 | doerwalter | create | |