Message8825
| Author |
jeff.allen |
| Recipients |
jeff.allen, kasso, rpan, zyasoft |
| Date |
2014年06月25日.21:13:27 |
| SpamBayes Score |
-1.0 |
| Marked as misclassified |
Yes |
| Message-id |
<1403730808.26.0.598447160794.issue2123@psf.upfronthosting.co.za> |
| In-reply-to |
| Content |
On Windows we do not have the UTF-8 option. This is what CPython does with code page 936:
>python
Python 2.7.6 (default, Nov 10 2013, 19:24:24) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.stdin.encoding
'cp936'
>>> s = "使用"
>>> s
'\xca\xb9\xd3\xc3'
>>> print s
使用
>>>
Jython is now exactly the same, except that Java likes to call the encoding ms936. (Java actually tests the range of the code page number so it can call some of them cp* and some ms*; I assume there's a good reason.)
A str is a sequence of bytes, not characters. When you just type s at the prompt, Python actually prints repr(s), which gives you a "safe" representation, such as you might have written in ascii source code. When you execute print s, it pushes the bytes out through sys.stdout and what you see is the result of the (Windows) console interpreting those bytes, in this case as code page 936. The same bytes would normally come out on my console like this (code page 1252):
>>> s
'\xca\xb9\xd3\xc3'
>>> print s
Ê1ÓÃ
At a time when CPython only dealt with bytes, Jython chose to allow UTF-16 characters in strings, interchangeably with Java. Since then, Python has evolved to support unicode as a distinct type, and later Jython versions conform to that design.
Bottom line: this aspect of Jython is correct now (probably). Thanks for making us think about it. |
|
History
|
|---|
| Date |
User |
Action |
Args |
| 2014年06月25日 21:13:28 | jeff.allen | set | messageid: <1403730808.26.0.598447160794.issue2123@psf.upfronthosting.co.za> |
| 2014年06月25日 21:13:28 | jeff.allen | set | recipients:
+ jeff.allen, zyasoft, rpan, kasso |
| 2014年06月25日 21:13:28 | jeff.allen | link | issue2123 messages |
| 2014年06月25日 21:13:27 | jeff.allen | create |
|