homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Allow 1-character ASCII unicode where 1-character str is required
Type: enhancement Stage: resolved
Components: Interpreter Core Versions: Python 2.7
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: alexandre.vassalotti, benjamin.peterson, gvanrossum, pitrou, serhiy.storchaka, terry.reedy, vstinner
Priority: normal Keywords: patch

Created on 2013年12月18日 15:59 by serhiy.storchaka, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
getargs_c_unicode.patch serhiy.storchaka, 2013年12月18日 15:59 review
getargs_c_unicode_2.patch serhiy.storchaka, 2014年03月04日 14:27 review
Messages (15)
msg206529 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013年12月18日 15:59
In most cases when str object required, unicode object is allowed too. "s" and "z" codes (with modifiers) in PyArg_Parse*() accept both str and unicode instances. But "c" code accepts only 1-character str, not unicode. This makes harder writing version-agnostic code with imported unicode_literals (2.7 functions require bytes literals, 3.x functions require unicode literals) and breaks pickle compatibility (see issue13566).
This change will affect:
* str.ljust(), str.rjust() and str.center();
* '%c' % char;
* mmap.write_byte();
* array constructor and item setter for 'c' type;
* datetime.isoformat();
* bsddb.set_re_delim() and bsddb.set_re_pad();
* msvcrt.putch() and msvcrt.ungetch();
* swi.block.padstring().
msg206533 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013年12月18日 16:02
I don't like the heuristic of "ASCII only" characters. Accepting that may lead to bugs if later you pass a non-ASCII character.
And is it not too late to change that in Python 2.7? Version released 3 years ago and widely used in production.
msg206534 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2013年12月18日 16:04
I guess it makes porting to Python 3 easier, but can we do this in a
stable release?
msg206545 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013年12月18日 17:39
> I don't like the heuristic of "ASCII only" characters. Accepting that may
> lead to bugs if later you pass a non-ASCII character.
What behavior you propose for non-ASCII values?
> And is it not too late to change that in Python 2.7? Version released 3
> years ago and widely used in production.
Python 3 released 5 years ago, but many peoples still support and write 
new software on Python 2. While 2.7 in use, new 2.7 versions which 
help porting and interoperability with Python 3 will be desirable.
See also issue19099.
msg206719 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2013年12月20日 23:55
Victor and Stefan are correct. 2.7 is a fixed version of Python. CPython 2.7.z, z >= 1, only gets bug (and build) fixes. A 'new 2.7 version' would be 2.8, which will not happen. The fact that you propose to change the unambiguous doc shows that this is an enhancement, not a bugfix. This change would have had to be done in 2.7.0.
msg206722 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2013年12月21日 10:35
Well, generally I'd be against adding features, but this particular one
could be rationalized in the same way as PEP 414. So I'm simply unsure
whether the feature should be added, but *if* it's added, it should
be backed by a pronouncement either from the RM or Guido.
msg206725 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2013年12月21日 11:07
PEP414 was about adding a feature to 3.3 well before the first alpha release. 
What Guido has recently said about 2.7 is that after 3 1/2 years we should concentrate on build issues such as came up with the new OSX and de-emphasize or even cease fixing bugs. He thinks that by now, people will have worked around the ones that matter.
msg212614 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014年03月03日 07:32
Re-opened due to Python-Dev discussion http://comments.gmane.org/gmane.comp.python.devel/146057.
msg212643 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014年03月03日 15:34
This sounds reasonable to me, but the patch lacks tests.
msg212644 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014年03月03日 15:36
However, do note that the semantics will end up different from other uses of unicode. e.g.:
>>> "aa".strip(u"b")
u'aa'
In str.strip(), passing an unicode parameter returns an unicode string. In str.ljust(), passing an unicode parameter will return a byte string.
msg212647 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2014年03月03日 18:02
The behavior of str.strip is not what I would expect from reading 
'''str.strip([chars])
 Return a copy of the string with the leading and trailing characters removed.'''
but I guess it is consistent with a general rule that when mixing bytes and unicode (which was not always), the bytes are latin-1 decoded to unicode. However, the 'not always' part (str.strip yes, str.ljust no) made Python a bit inconsistent with itself.
Adding the unicode_literals import made Python more inconsistent with itself. "You can change byte literals to unicode literals, but if you do and you use one of the stdlib text apis that are bytes only, your program breaks." This patch moves the inconsistency around a bit but does not remove it. People who stick with 2.7 will have to live with inconsistency one way or another.
The turtle color issue is quite different in that it involve text names ('red') or encodings('#aabbcc') for color tuples that are quoted as text in order to be passed on to tkinter and tk (which wants unicode anyway).
msg212673 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014年03月03日 21:28
> However, do note that the semantics will end up different from other uses of 
unicode. e.g.:
> >>> "aa".strip(u"b")
> 
> u'aa'
And this behavior is weird.
>>> print 'À\n'.strip('\n')
À
>>> print 'À\n'.strip(u'\n')
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: 
ordinal not in range(128)
The self argument of str.strip is variable, but the chars argument is almost 
always a literal and affected by unicode_literals future.
msg212721 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014年03月04日 14:27
Here is a patch with tests. Not all affected methods are tested because not all methods and modules have tests at all.
msg236099 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015年02月16日 10:10
What is your decision Guido and Benjamin?
msg236116 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2015年02月16日 22:26
Let's not do this. The time to meddle with Python 2.7 details is long gone.
History
Date User Action Args
2022年04月11日 14:57:55adminsetgithub: 64214
2015年02月16日 22:35:33terry.reedysetstage: patch review -> resolved
2015年02月16日 22:26:16gvanrossumsetstatus: open -> closed
resolution: wont fix
messages: + msg236116
2015年02月16日 10:10:22serhiy.storchakasetnosy: + gvanrossum
messages: + msg236099
2014年05月22日 22:06:50skrahsetnosy: - skrah
2014年03月04日 14:27:22serhiy.storchakasetfiles: + getargs_c_unicode_2.patch

messages: + msg212721
2014年03月03日 21:28:28serhiy.storchakasetmessages: + msg212673
2014年03月03日 18:02:59terry.reedysetmessages: + msg212647
2014年03月03日 15:36:46pitrousetmessages: + msg212644
2014年03月03日 15:34:49pitrousetmessages: + msg212643
2014年03月03日 07:32:24serhiy.storchakasetstatus: closed -> open
resolution: not a bug -> (no value)
messages: + msg212614

stage: resolved -> patch review
2013年12月21日 11:07:55terry.reedysetmessages: + msg206725
2013年12月21日 10:35:14skrahsetmessages: + msg206722
2013年12月20日 23:55:56terry.reedysetstatus: open -> closed

type: enhancement

nosy: + terry.reedy
messages: + msg206719
resolution: not a bug
stage: patch review -> resolved
2013年12月18日 17:39:56serhiy.storchakasetmessages: + msg206545
2013年12月18日 16:04:13skrahsetnosy: + skrah, benjamin.peterson
messages: + msg206534
2013年12月18日 16:02:38vstinnersetnosy: + vstinner
messages: + msg206533
2013年12月18日 15:59:26serhiy.storchakacreate

AltStyle によって変換されたページ (->オリジナル) /