[Python-Dev] Unicode strings as filenames

Skip Montanaro skip@pobox.com (Skip Montanaro)
Thu, 3 Jan 2002 09:11:01 -0600


What's the correct way to deal with filenames in a Unicode environment?=
Consider this:
 >>> import site
 >>> site.encoding
 'latin-1'
 >>> a =3D "abc\xe4\xfc\xdf.txt"
 >>> u =3D unicode (a, "latin-1")
 >>> uu =3D u.encode ("utf-8")
 >>> open(a, "w")
 <open file 'abc=E4=FC=DF.txt', mode 'w' at 0x823c2a0>
 >>> open(u, "w")
 <open file 'abc=E4=FC=DF.txt', mode 'w' at 0x823a1e8>
 >>> open(uu, "w")
 <open file 'abc=C3=A4=C3=BC=C3.txt', mode 'w' at 0x81d6160>
If I change my site's default encoding back to ascii, the second open f=
ails:
 >>> import site
 >>> site.encoding
 'ascii'
 >>> a =3D "abc\xe4\xfc\xdf.txt"
 >>> u =3D unicode (a, "latin-1")
 >>> uu =3D u.encode ("utf-8")
 >>> open(a, "w")
 <open file 'abc=E4=FC=DF.txt', mode 'w' at 0x822b448>
 >>> open(u, "w")
 Traceback (most recent call last):
 File "<stdin>", line 1, in ?
 UnicodeError: ASCII encoding error: ordinal not in range(128)
 >>> open(uu, "w")
 <open file 'abc=C3=A4=C3=BC=C3.txt', mode 'w' at 0x822d260>
as I expect it should. The third open is a problem as well, even thoug=
h it
succeeds with either encoding. (Why doesn't it fail when the default
encoding is ascii?) My thought is that before using a plain string or =
a
unicode string as a filename it should first be coerced to a unicode st=
ring
with the default encoding, something like:
 if type(fname) =3D=3D types.StringType:
 fname =3D unicode(fname, site.encoding)
 elif type(fname) =3D=3D types.UnicodeType:
 fname =3D fname.encode(site.encoding)
 else:
 raise TypeError, ("unrecognized type for filename: %s"%type(fna=
me))
Is that the correct approach? Apparently Python's file object doesn't =
do
this under the covers. Should it?
Thx,
Skip

AltStyle によって変換されたページ (->オリジナル) /