[Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

Zooko O'Whielacronx zooko at zooko.com
Sat Apr 25 17:29:54 CEST 2009


Thanks for writing this PEP 383, MvL. I recently ran into this 
problem in Python 2.x in the Tahoe project [1]. The Tahoe project 
should be considered a good use case showing what some people need. 
For example, the assumption that a file will later be written back 
into the same local filesystem (and thus luckily use the same 
encoding) from which it originally came doesn't hold for us, because 
Tahoe is used for file-sharing as well as for backup-and-restore.
One of my first conclusions in pursuing this issue is that we can 
never use the Python 2.x unicode APIs on Linux, just as we can never 
use the Python 2.x str APIs on Windows [2]. (You mentioned this 
ugliness in your PEP.) My next conclusion was that the Linux way of 
doing encoding of filenames really sucks compared to, for example, 
the Mac OS X way. I'm heartened to see what David Wheeler is trying 
to persuade the maintainers of Linux filesystems to improve some of 
this: [3].
My final conclusion was that we needed to have two kinds of 
workaround for the Linux suckage: first, if decoding using the 
suggested filesystem encoding fails, then we fall back to mojibake 
[4] by decoding with iso-8859-1 (or else with windows-1252 -- I'm not 
sure if it matters and I haven't yet understood if utf-8b offers 
another alternative for this case). Second, if decoding succeeds 
using the suggested filesystem encoding on Linux, then write down the 
encoding that we used and include that with the filename. This 
expands the size of our filenames significantly, but it is the only 
way to allow some future programmer to undo the damage of a falsely- 
successful decoding. Here's our whole plan: [5].
Regards,
Zooko
[1] http://allmydata.org
[2] http://allmydata.org/pipermail/tahoe-dev/2009-March/001379.html # 
see the footnote of this message
[3] http://www.dwheeler.com/essays/fixing-unix-linux-filenames.html
[4] http://en.wikipedia.org/wiki/Mojibake
[5] http://allmydata.org/trac/tahoe/ticket/534#comment:47


More information about the Python-Dev mailing list

AltStyle によって変換されたページ (->オリジナル) /