homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: center, ljust and rjust are inconsistent with unicode parameters
Type: enhancement Stage:
Components: Library (Lib), Unicode Versions: Python 2.7, Python 2.6
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: belopolsky Nosy List: belopolsky, ezio.melotti, ignas, rhettinger, vstinner
Priority: normal Keywords:

Created on 2008年07月25日 15:58 by ignas, last changed 2022年04月11日 14:56 by admin. This issue is now closed.

Messages (8)
msg70258 - (view) Author: Ignas Mikalajūnas (ignas) Date: 2008年07月25日 15:58
Not all combinations of unicode/non-unicode parameters work for ljust,
center and rjust. Passing a unicode character to them as a parameter
when the string is ascii fails with an error.
This doctest fails in 3 places. Though I would expect it to be passing.
def doctest_strings():
 """
 >>> uni = u"a"
 >>> ascii = "a"
 >>> uni.center(5, ascii)
 u'aaaaa'
 >>> uni.center(5, uni)
 u'aaaaa'
 >>> ascii.center(5, ascii)
 'aaaaa'
 >>> ascii.center(5, uni)
 u'aaaaa'
 >>> uni.ljust(5, ascii)
 u'aaaaa'
 >>> uni.ljust(5, uni)
 u'aaaaa'
 >>> ascii.ljust(5, ascii)
 'aaaaa'
 >>> ascii.ljust(5, uni)
 u'aaaaa'
 >>> uni.rjust(5, ascii)
 u'aaaaa'
 >>> uni.rjust(5, uni)
 u'aaaaa'
 >>> ascii.rjust(5, ascii)
 'aaaaa'
 >>> ascii.rjust(5, uni)
 u'aaaaa'
 """
msg82514 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2009年02月20日 06:13
Indeed this behavior doesn't seem to be documented.
When the string is unicode and the fillchar non-unicode Python
implicitly tries to decode the fillchar (and possibly it raises a
TypeError if it's not in range(0,128)):
>>> u'x'.center(5, 'y') # unicode string, non-unicode (str) fillchar
u'yyxyy' # the fillchar is decoded
When the string is non-unicode it only accepts a non-unicode fillchar
(e.g. 'x'.center(5, 'y')) and it raises a TypeError if the fillchar is
unicode:
>>> 'x'.center(5, u'y') # non-unicode (str) string, unicode fillchar
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
TypeError: center() argument 2 must be char, not unicode
If it tries to decode the fillchar when the string is unicode, it could
also try to encode the unicode fillchar (and possibly raise a TypeError)
when the string is non-unicode.
Py3, instead, seems to have the opposite behavior. It implicitly encodes
unicode fillchars into byte strings when the string is a byte string but
it doesn't decode a byte fillchar if the string is unicode:
>>> b'x'.center(5, 'y') # byte string, unicode fillchar
b'yyxyy' # the fillchar is encoded
>>> 'x'.center(5, b'y') # unicode string, byte fillchar
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
TypeError: The fill character cannot be converted to Unicode
In the doc [1] there's written that "The methods on bytes and bytearray
objects don’t accept strings as their arguments, just as the methods on
strings don’t accept bytes as their arguments." so b'x'.center(5, 'y')
should probably raise an error on Py3 (I could open a new issue for this).
[1]:
http://docs.python.org/3.0/library/stdtypes.html#bytes-and-byte-array-methods
- In the note
msg82660 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2009年02月24日 07:19
In Py2.x, I think the desired behavior should match str.join(). If
either input in unicode the output is unicode. If both are ascii, ascii
should come out.
For Py3.x, I think the goal was to have str.join() enforce that both
inputs are unicode. If either are bytes, then you have to know the
encoding.
msg83669 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009年03月17日 12:10
About Python3, bytes.center accepts unicode as second argument, which 
is an error for me:
>>> b"x".center(5, b"\xe9")
b'\xe9\xe9x\xe9\xe9'
>>> b"x".center(5, "\xe9")
b'\xe9\xe9x\xe9\xe9'
The second example must fail with a TypeError.
str.center has the right behaviour:
>>> "x".center(5, "\xe9")
'ééxéé'
>>> "x".center(5, b"\xe9")
TypeError: The fill character cannot be converted to Unicode
msg87121 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009年05月04日 12:36
haypo> About Python3, bytes.center accepts unicode as second argument,
haypo> which is an error for me
Ok, it's fixed thanks by r71013 (issue #5499).
msg87122 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009年05月04日 12:38
This issue only concerns Python 2.x, Python 3.x has the right 
behaviour: it disallow mixing bytes with characters.
msg87123 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009年05月04日 12:58
The question is why str.{ljust,rjust,center} doesn't accept unicode 
argument, whereas unicode.{ljust,rjust,center} accept ASCII string. 
Other string methods accept unicode argument, like str.count() (encode 
the unicode string to bytes using utf8 charset).
To be consistent with other string methods, str.{ljust,rjust,center} 
should accept unicode string and convert them to byte string using 
utf8, like str.count does. But I hate such implicit conversion (I 
prefer Python3 way: disallow mixing bytes and characters), so I will 
not contribute to such patch.
Can you write such patch?
--
str.{ljust,rjust,center} use PyArg_ParseTuple(args, "n|c:...", ...) 
and getarg('c') which only accepts a string of 1 byte.
unicode.{ljust,rjust,center} use PyArg_ParseTuple(args, "n|
O&:...", ..., convert_uc, ...) where convert_uc looks something like:
 def convert_uc(o):
 try:
 u = unicode(o)
 except:
 raise TypeError("The fill character cannot be converted to 
Unicode")
 if len(u) != 1:
 raise TypeError("The fill character must be exactly one 
character long"))
 return u[0]
convert_uc() accepts an byte string of 1 ASCII.
string_count() uses PyArg_ParseTuple(args, "O...", ...) and then test 
the substring type.
msg123483 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010年12月06日 18:25
As a feature request for 2.x, I think this should be rejected.
Any objections?
The "behavior" part seem to have been fixed.
History
Date User Action Args
2022年04月11日 14:56:36adminsetgithub: 47696
2010年12月15日 19:40:58belopolskysetstatus: pending -> closed
nosy: rhettinger, belopolsky, vstinner, ezio.melotti, ignas
2010年12月06日 18:25:13belopolskysetstatus: open -> pending

type: behavior -> enhancement
assignee: belopolsky

nosy: + belopolsky
messages: + msg123483
resolution: rejected
2009年05月04日 12:58:26vstinnersetmessages: + msg87123
2009年05月04日 12:38:30vstinnersetmessages: + msg87122
versions: + Python 2.7, - Python 2.5, Python 2.4, Python 3.0
2009年05月04日 12:36:41vstinnersetmessages: + msg87121
2009年03月17日 12:10:59vstinnersetnosy: + vstinner
messages: + msg83669
2009年02月24日 07:19:36rhettingersetnosy: + rhettinger
messages: + msg82660
2009年02月20日 06:13:59ezio.melottisetnosy: + ezio.melotti
messages: + msg82514
versions: + Python 2.6, Python 3.0
2008年07月25日 15:58:23ignascreate

AltStyle によって変換されたページ (->オリジナル) /