[Python-Dev] Heads up: unicode file I/O in JPython.
Finn Bock
bckfnn@worldonline.dk
2000年5月20日 15:19:09 GMT
I have recently released errata-07 which improves on JPython's ability
to handle unicode characters as well as binary data read from and
written to python files.
The conversions can be described as
- I/O to a file opened in binary mode will read/write the low 8-bit
of each char. Writing Unicode chars >0xFF will cause silent
truncation [*].
- I/O to a file opened in text mode will push the character
through the default encoding for the platform (in addition to
handling CR/LF issues).
This breaks completely with python1.6a2, but I believe that it is close
to the expectations of java users. (The current JPython-1.1 behavior are
completely useless for both characters and binary data. It only barely
manage to handle 7-bit ASCII).
In JPython (with the errata) we can do:
f = open("test207.out", "w")
f.write("\x20ac") # On my w2k platform this writes 0x80 to the file.
f.close()
f = open("test207.out", "r")
print hex(ord(f.read()))
f.close()
f = open("test207.out", "wb")
f.write("\x20ac") # On all platforms this writes 0xAC to the file.
f.close()
f = open("test207.out", "rb")
print hex(ord(f.read()))
f.close()
With the output of:
0x20ac
0xac
I do not expect anything like this in CPython. I just hope that all
unicode advice given on c.l.py comes with the modifier, that JPython
might do it differently.
regards,
finn
http://sourceforge.net/project/filelist.php?group_id=1842
[*] Silent overflow is bad, but it is at least twice as fast as having
to check each char for overflow.