[Python-Dev] Heads up: unicode file I/O in JPython.

2000年5月20日 15:19:09 GMT

I have recently released errata-07 which improves on JPython's ability
to handle unicode characters as well as binary data read from and
written to python files.
The conversions can be described as
- I/O to a file opened in binary mode will read/write the low 8-bit 
 of each char. Writing Unicode chars >0xFF will cause silent
 truncation [*].
- I/O to a file opened in text mode will push the character 
 through the default encoding for the platform (in addition to 
 handling CR/LF issues).
This breaks completely with python1.6a2, but I believe that it is close
to the expectations of java users. (The current JPython-1.1 behavior are
completely useless for both characters and binary data. It only barely
manage to handle 7-bit ASCII).
In JPython (with the errata) we can do:
 f = open("test207.out", "w")
 f.write("\x20ac") # On my w2k platform this writes 0x80 to the file.
 f.close()
 f = open("test207.out", "r")
 print hex(ord(f.read()))
 f.close()
 f = open("test207.out", "wb")
 f.write("\x20ac") # On all platforms this writes 0xAC to the file.
 f.close()
 f = open("test207.out", "rb")
 print hex(ord(f.read()))
 f.close()
With the output of:
 0x20ac
 0xac
I do not expect anything like this in CPython. I just hope that all
unicode advice given on c.l.py comes with the modifier, that JPython
might do it differently.
regards,
finn
 http://sourceforge.net/project/filelist.php?group_id=1842
[*] Silent overflow is bad, but it is at least twice as fast as having
to check each char for overflow.