This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2006年08月18日 14:37 by sgala, last changed 2022年04月11日 14:56 by admin. This issue is now closed.
| Messages (11) | |||
|---|---|---|---|
| msg29549 - (view) | Author: Santiago Gala (sgala) | Date: 2006年08月18日 14:37 | |
in bug 1528802 ( see https://sourceforge.net/tracker/index.php?func=detail&aid=1528802&group_id=5470&atid=105470 ) , I noticed that idle shell behaviour WRT non-ascii chars was different than python console, and possibly broken. For example, IDLE produces: >>> print u"á" á >>> print len(u"á") 2 >>> print "á" á >>> print len("á") 2 ------- a python shell (gnome-terminal): >>> print u"á" á >>> print len(u"á") 1 >>> print "á" á >>> print len("á") 2 Both are using es_ES.utf-8 system encoding. IDLE can manage unicode, it is just input that gives problems: >>> import unicodedata >>> print unicodedata.lookup("LATIN SMALL LETTER A WITH ACUTE") á >>> print len(unicodedata.lookup("LATIN SMALL LETTER A WITH ACUTE")) 1 Not that I like that much the violation of the least surprising behaviour that python console offers with non-ascii letters, but at least some internal consistency would be great, until python 3000 gives us true strings. I'm using python 2.5 (svn trunk) --with-unicode=ucs4 |
|||
| msg59954 - (view) | Author: Santiago Gala (sgala) | Date: 2008年01月15日 01:56 | |
works in python 3ka2 (svn as of today):
>>> print("á")
á
>>> print(b"á")
SyntaxError: bytes can only contain ASCII literal characters.
(<pyshell#5>, line 1)
as it should, so the problem appears in 2.* only.
|
|||
| msg84482 - (view) | Author: Daniel Diniz (ajaksu2) * (Python triager) | Date: 2009年03月30日 03:42 | |
This is about a disparity between IDLE and the python shell. I'm guessing different encodings are to blame here and that this is invalid. The disparity is present in an UCS4 build (IDLE shows UCS2-like behavior[1], maybe because it's using UTF8?). [1] http://mail.python.org/pipermail/python-dev/2008-July/080886.html |
|||
| msg85881 - (view) | Author: Inada Naoki (methane) * (Python committer) | Date: 2009年04月12日 00:45 | |
This issue is caused by compile() behavior.
Following sample is in codepage 932.
>>> 'あ'
'\x82\xa0' # OK - 'あ' is '\x82\xa0' in cp932
>>> u'あ'
u'\u3042' # OK - u'あ' is '\u3042' in UCS-2
compile as byte string.
>>> c = compile("'あ'", 'test', 'single')
>>> exec c
'\x82\xa0' # OK
>>> c = compile("u'あ'", 'test', 'single')
>>> exec c
u'\x82\xa0' # NG!!!
compile as unicode string.
>>> c = compile(u"'あ'", 'test', 'single')
>>> exec c
'\xe3\x81\x82' # NG!!!
>>> c = compile(u"u'あ'", 'test', 'single')
>>> exec c
u'\u3042' # OK
compile as byte string with pep 0263
>>> c = compile("# coding: mbcs\n'あ'", 'test', 'single')
>>> exec c
'\x82\xa0' # OK
>>> c = compile("# coding: mbcs\nu'あ'", 'test', 'single')
>>> exec c
u'\u3042' # OK
|
|||
| msg85882 - (view) | Author: Inada Naoki (methane) * (Python committer) | Date: 2009年04月12日 00:54 | |
This patch is for iplib/PyShell.py#ModifiedInterpreter.runsource. if isinstance(source, types.UnicodeType): import IOBinding try: source = source.encode(IOBinding.encoding) + source = "# coding: %s\n%s" % (IOBinding.encoding, source) except UnicodeError: |
|||
| msg85886 - (view) | Author: Santiago Gala (sgala) | Date: 2009年04月12日 09:02 | |
Updating the components as the error surfaces in the compile builtin.
the compile builtin works when given unicode, but fails when using a
utf8 (local input encoding) string.
Rather than adding a "coding" string to compile, my guess is that
compile should be fixed or fed a unicode string. See the effects on the
shell:
>>> print len('à')
2
>>> print len(u'à')
1
>>> exec compile("print len('à')",'test', 'single')
2
>>> exec compile("print len(u'à')",'test', 'single')
2
>>> exec compile("print len('à')".decode("utf8"),'test', 'single')
2
>>> exec compile("print len(u'à')".decode("utf8"),'test', 'single')
1
>>>
So the error disappears when the string fed to exec compile is properly
decoded to unicode.
In idlelib there is an attempt to encode the input to
IOBindings.encoding, but IOBindings.encoding is broken here, as
locale.nl_langinfo(locale.CODESET) gives 'ANSI_X3.4-1968', which looks
up as 'ascii', while locale.getpreferredencoding() gives 'UTF-8' (as it
should).
If I comment the whole attempt, idle works (for this test, not fully
tested):
sgala@marlow ~ $ diff -u /tmp/PyShell.py
/usr/lib64/python2.6/idlelib/PyShell.py
--- /tmp/PyShell.py 2009年04月12日 11:01:01.000000000 +0200
+++ /usr/lib64/python2.6/idlelib/PyShell.py 2009年04月12日
10:59:16.000000000 +0200
@@ -592,14 +592,14 @@
self.more = 0
self.save_warnings_filters = warnings.filters[:]
warnings.filterwarnings(action="error", category=SyntaxWarning)
- if isinstance(source, types.UnicodeType):
- import IOBinding
- try:
- source = source.encode(IOBinding.encoding)
- except UnicodeError:
- self.tkconsole.resetoutput()
- self.write("Unsupported characters in input\n")
- return
+ #if isinstance(source, types.UnicodeType):
+ # import IOBinding
+ # try:
+ # source = source.encode(IOBinding.encoding)
+ # except UnicodeError:
+ # self.tkconsole.resetoutput()
+ # self.write("Unsupported characters in input\n")
+ # return
try:
# InteractiveInterpreter.runsource() calls its runcode()
method,
# which is overridden (see below)
>>> print len('á')
2
>>> print len(u'á')
1
>>> print 'á'
á
>>> print u'á'
á
>>>
Now using Python 2.6.1 (r261:67515, Apr 10 2009, 14:34:00) on x86_64
|
|||
| msg85891 - (view) | Author: Inada Naoki (methane) * (Python committer) | Date: 2009年04月12日 10:40 | |
utf-8 is not locale encoding.
>>> f = open('á.txt')
If this line compiled into utf-8 and locale encoding is not utf-8, can't
open 'á.txt'.
IMHO, in case of Python 2.x, correct approach is fix IOBindings.encoding
and compile() with pep0263.
|
|||
| msg85892 - (view) | Author: Inada Naoki (methane) * (Python committer) | Date: 2009年04月12日 11:04 | |
How to use locale.getpreferredencoding() instead of
locale.nl_langinfo(locale.CODESET).
--- IOBinding.py.back Sun Apr 12 19:54:52 2009
+++ IOBinding.py Sun Apr 12 20:02:58 2009
@@ -35,40 +35,16 @@
# Encoding for file names
filesystemencoding = sys.getfilesystemencoding()
-encoding = "ascii"
-if sys.platform == 'win32':
- # On Windows, we could use "mbcs". However, to give the user
- # a portable encoding name, we need to find the code page
- try:
- encoding = locale.getdefaultlocale()[1]
- codecs.lookup(encoding)
- except LookupError:
- pass
-else:
- try:
- # Different things can fail here: the locale module may not be
- # loaded, it may not offer nl_langinfo, or CODESET, or the
- # resulting codeset may be unknown to Python. We ignore all
- # these problems, falling back to ASCII
- encoding = locale.nl_langinfo(locale.CODESET)
- if encoding is None or encoding is '':
- # situation occurs on Mac OS X
- encoding = 'ascii'
- codecs.lookup(encoding)
- except (NameError, AttributeError, LookupError):
- # Try getdefaultlocale well: it parses environment variables,
- # which may give a clue. Unfortunately, getdefaultlocale has
- # bugs that can cause ValueError.
- try:
- encoding = locale.getdefaultlocale()[1]
- if encoding is None or encoding is '':
- # situation occurs on Mac OS X
- encoding = 'ascii'
- codecs.lookup(encoding)
- except (ValueError, LookupError):
- pass
+encoding = "utf-8"
-encoding = encoding.lower()
+preferredencoding = None
+try:
+ preferredencoding = locale.getpreferredencoding()
+ codecs.lookup(preferredencoding)
+ encoding = preferredencoding.lower()
+except LookupError:
+ pass
+del preferredencoding
coding_re = re.compile("coding[:=]\s*([-\w_.]+)")
|
|||
| msg119540 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2010年10月25日 06:29 | |
As indicated in msg59954, it works fine on 3.x, so removing these versions. |
|||
| msg119541 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2010年10月25日 06:48 | |
For 2.7, I don't think it's possible to really fix this. I see the following options: A. current status. Byte strings are compiled correctly, Unicode strings are not. B. compile source as a Unicode string, as proposed in msg85886. Unicode strings are compiled propertly, byte strings are not (they get compiled as UTF-8, when they should get compiled in the locale encoding) C. prefix source with encoding declaration, as proposed in msg85882. Both Unicode strings and byte strings get compiled correctly, but line numbers in tracebacks are wrong. Given that it's not possible to fix this without breaking something else, and given that it's fixed in Python 3, I propose to declare this as "won't fix" for Python 2.7. In any case, the bug is certainly not in compile(), which is behaving exactly as specified, so I revert the title change. |
|||
| msg157592 - (view) | Author: Andrew Svetlov (asvetlov) * (Python committer) | Date: 2012年04月05日 14:09 | |
Closing as won't fix. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:56:19 | admin | set | github: 43853 |
| 2012年04月05日 14:09:32 | asvetlov | set | status: open -> closed assignee: kbk -> asvetlov nosy: + asvetlov messages: + msg157592 resolution: wont fix stage: test needed -> resolved |
| 2010年10月25日 09:38:33 | Trundle | set | nosy:
+ Trundle |
| 2010年10月25日 06:48:56 | loewis | set | messages:
+ msg119541 title: compile(): IDLE shell gives different len() of unicode strings compared to Python shell -> IDLE shell gives different len() of unicode strings compared to Python shell |
| 2010年10月25日 06:29:16 | loewis | set | messages:
+ msg119540 versions: - Python 3.1, Python 3.2 |
| 2010年10月24日 22:52:22 | eric.araujo | set | nosy:
+ eric.araujo, lemburg, loewis components: + Unicode |
| 2010年08月24日 20:16:17 | BreamoreBoy | set | versions: + Python 3.1, Python 2.7, Python 3.2, - Python 2.6 |
| 2009年05月08日 19:08:41 | ajaksu2 | set | dependencies: + built-in compile() should take encoding option. |
| 2009年04月26日 22:23:47 | ajaksu2 | set | nosy:
+ vstinner, asmodai title: IDLE shell gives different len() of unicode strings compared to Python shell -> compile(): IDLE shell gives different len() of unicode strings compared to Python shell keywords: + patch type: behavior stage: test needed |
| 2009年04月12日 11:04:19 | methane | set | messages: + msg85892 |
| 2009年04月12日 10:40:44 | methane | set | messages: + msg85891 |
| 2009年04月12日 09:02:31 | sgala | set | messages:
+ msg85886 components: + Interpreter Core |
| 2009年04月12日 00:54:39 | methane | set | messages: + msg85882 |
| 2009年04月12日 00:45:54 | methane | set | nosy:
+ methane messages: + msg85881 |
| 2009年03月30日 03:42:03 | ajaksu2 | set | versions:
+ Python 2.6, - Python 2.5 nosy: + ajaksu2 title: IDLE shell doesn't accept non ascii char input -> IDLE shell gives different len() of unicode strings compared to Python shell messages: + msg84482 |
| 2008年01月15日 01:56:43 | sgala | set | messages: + msg59954 |
| 2007年12月03日 19:33:47 | facundobatista | set | title: IDLE shell doesn\'t accept non ascii char input -> IDLE shell doesn't accept non ascii char input |
| 2006年08月18日 14:37:06 | sgala | create | |