This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2008年07月25日 15:58 by ignas, last changed 2022年04月11日 14:56 by admin. This issue is now closed.
| Messages (8) | |||
|---|---|---|---|
| msg70258 - (view) | Author: Ignas Mikalajūnas (ignas) | Date: 2008年07月25日 15:58 | |
Not all combinations of unicode/non-unicode parameters work for ljust, center and rjust. Passing a unicode character to them as a parameter when the string is ascii fails with an error. This doctest fails in 3 places. Though I would expect it to be passing. def doctest_strings(): """ >>> uni = u"a" >>> ascii = "a" >>> uni.center(5, ascii) u'aaaaa' >>> uni.center(5, uni) u'aaaaa' >>> ascii.center(5, ascii) 'aaaaa' >>> ascii.center(5, uni) u'aaaaa' >>> uni.ljust(5, ascii) u'aaaaa' >>> uni.ljust(5, uni) u'aaaaa' >>> ascii.ljust(5, ascii) 'aaaaa' >>> ascii.ljust(5, uni) u'aaaaa' >>> uni.rjust(5, ascii) u'aaaaa' >>> uni.rjust(5, uni) u'aaaaa' >>> ascii.rjust(5, ascii) 'aaaaa' >>> ascii.rjust(5, uni) u'aaaaa' """ |
|||
| msg82514 - (view) | Author: Ezio Melotti (ezio.melotti) * (Python committer) | Date: 2009年02月20日 06:13 | |
Indeed this behavior doesn't seem to be documented. When the string is unicode and the fillchar non-unicode Python implicitly tries to decode the fillchar (and possibly it raises a TypeError if it's not in range(0,128)): >>> u'x'.center(5, 'y') # unicode string, non-unicode (str) fillchar u'yyxyy' # the fillchar is decoded When the string is non-unicode it only accepts a non-unicode fillchar (e.g. 'x'.center(5, 'y')) and it raises a TypeError if the fillchar is unicode: >>> 'x'.center(5, u'y') # non-unicode (str) string, unicode fillchar Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: center() argument 2 must be char, not unicode If it tries to decode the fillchar when the string is unicode, it could also try to encode the unicode fillchar (and possibly raise a TypeError) when the string is non-unicode. Py3, instead, seems to have the opposite behavior. It implicitly encodes unicode fillchars into byte strings when the string is a byte string but it doesn't decode a byte fillchar if the string is unicode: >>> b'x'.center(5, 'y') # byte string, unicode fillchar b'yyxyy' # the fillchar is encoded >>> 'x'.center(5, b'y') # unicode string, byte fillchar Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: The fill character cannot be converted to Unicode In the doc [1] there's written that "The methods on bytes and bytearray objects don’t accept strings as their arguments, just as the methods on strings don’t accept bytes as their arguments." so b'x'.center(5, 'y') should probably raise an error on Py3 (I could open a new issue for this). [1]: http://docs.python.org/3.0/library/stdtypes.html#bytes-and-byte-array-methods - In the note |
|||
| msg82660 - (view) | Author: Raymond Hettinger (rhettinger) * (Python committer) | Date: 2009年02月24日 07:19 | |
In Py2.x, I think the desired behavior should match str.join(). If either input in unicode the output is unicode. If both are ascii, ascii should come out. For Py3.x, I think the goal was to have str.join() enforce that both inputs are unicode. If either are bytes, then you have to know the encoding. |
|||
| msg83669 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2009年03月17日 12:10 | |
About Python3, bytes.center accepts unicode as second argument, which is an error for me: >>> b"x".center(5, b"\xe9") b'\xe9\xe9x\xe9\xe9' >>> b"x".center(5, "\xe9") b'\xe9\xe9x\xe9\xe9' The second example must fail with a TypeError. str.center has the right behaviour: >>> "x".center(5, "\xe9") 'ééxéé' >>> "x".center(5, b"\xe9") TypeError: The fill character cannot be converted to Unicode |
|||
| msg87121 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2009年05月04日 12:36 | |
haypo> About Python3, bytes.center accepts unicode as second argument, haypo> which is an error for me Ok, it's fixed thanks by r71013 (issue #5499). |
|||
| msg87122 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2009年05月04日 12:38 | |
This issue only concerns Python 2.x, Python 3.x has the right behaviour: it disallow mixing bytes with characters. |
|||
| msg87123 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2009年05月04日 12:58 | |
The question is why str.{ljust,rjust,center} doesn't accept unicode
argument, whereas unicode.{ljust,rjust,center} accept ASCII string.
Other string methods accept unicode argument, like str.count() (encode
the unicode string to bytes using utf8 charset).
To be consistent with other string methods, str.{ljust,rjust,center}
should accept unicode string and convert them to byte string using
utf8, like str.count does. But I hate such implicit conversion (I
prefer Python3 way: disallow mixing bytes and characters), so I will
not contribute to such patch.
Can you write such patch?
--
str.{ljust,rjust,center} use PyArg_ParseTuple(args, "n|c:...", ...)
and getarg('c') which only accepts a string of 1 byte.
unicode.{ljust,rjust,center} use PyArg_ParseTuple(args, "n|
O&:...", ..., convert_uc, ...) where convert_uc looks something like:
def convert_uc(o):
try:
u = unicode(o)
except:
raise TypeError("The fill character cannot be converted to
Unicode")
if len(u) != 1:
raise TypeError("The fill character must be exactly one
character long"))
return u[0]
convert_uc() accepts an byte string of 1 ASCII.
string_count() uses PyArg_ParseTuple(args, "O...", ...) and then test
the substring type.
|
|||
| msg123483 - (view) | Author: Alexander Belopolsky (belopolsky) * (Python committer) | Date: 2010年12月06日 18:25 | |
As a feature request for 2.x, I think this should be rejected. Any objections? The "behavior" part seem to have been fixed. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:56:36 | admin | set | github: 47696 |
| 2010年12月15日 19:40:58 | belopolsky | set | status: pending -> closed nosy: rhettinger, belopolsky, vstinner, ezio.melotti, ignas |
| 2010年12月06日 18:25:13 | belopolsky | set | status: open -> pending type: behavior -> enhancement assignee: belopolsky nosy: + belopolsky messages: + msg123483 resolution: rejected |
| 2009年05月04日 12:58:26 | vstinner | set | messages: + msg87123 |
| 2009年05月04日 12:38:30 | vstinner | set | messages:
+ msg87122 versions: + Python 2.7, - Python 2.5, Python 2.4, Python 3.0 |
| 2009年05月04日 12:36:41 | vstinner | set | messages: + msg87121 |
| 2009年03月17日 12:10:59 | vstinner | set | nosy:
+ vstinner messages: + msg83669 |
| 2009年02月24日 07:19:36 | rhettinger | set | nosy:
+ rhettinger messages: + msg82660 |
| 2009年02月20日 06:13:59 | ezio.melotti | set | nosy:
+ ezio.melotti messages: + msg82514 versions: + Python 2.6, Python 3.0 |
| 2008年07月25日 15:58:23 | ignas | create | |