[Python-Dev] Unicode strings as filenames

Skip Montanaro skip@pobox.com (Skip Montanaro)
Thu, 3 Jan 2002 17:11:10 -0600


>>>>> "Martin" =3D=3D Martin v Loewis <martin@v.loewis.de> writes:

 >> What's the correct way to deal with filenames in a Unicode
 >> environment? Consider this:
 >>
 >> >>> import site site.encoding
 >> 'latin-1'
 Martin> Setting site.encoding is certainly the wrong thing to do. H=
ow
 Martin> can you know all users of your system use latin-1?
Why is setting site.encoding appropriate to your environment at the tim=
e you
install Python wrong? I can't know that all users of my system (whatev=
er
the definition of "my system" is) will use latin-1. Somewhere along th=
e way
I have to make some assumptions, however.
 On any given computer I assume the people who install Python will s=
et
 site.encoding appropriate to their environment.
 The example I used was latin-1 simply because the folks I'm working=
 with
 are in Austria and they came up with the example. I assume the bes=
t
 default encoding for them is latin-1.
 The application writers themselves will have no problem restricting=
 internal filenames to be ascii. I assume it users want to save fil=
es of
 their own, they will choose characters from the Unicode character s=
et
 they use most frequently.
So, my example used latin-1. I could just as easily have chosen someth=
ing
else.
 Martin> On my system, the following works fine
 Martin> >>> import locale ; locale.setlocale(locale.LC_ALL,"")
 Martin> 'LC_CTYPE=3Dde_DE;LC_NUMERIC=3Dde_DE;LC_TIME=3Dde_DE;LC_COL=
LATE=3DC;LC_MONETARY=3Dde_DE;LC_MESSAGES=3Dde_DE;LC_PAPER=3Dde_DE;LC_NA=
ME=3Dde_DE;LC_ADDRESS=3Dde_DE;LC_TELEPHONE=3Dde_DE;LC_MEASUREMENT=3Dde_=
DE;LC_IDENTIFICATION=3Dde_DE'
 Martin> >>> a =3D "abc\xe4\xfc\xdf.txt" u =3D unicode (a, "latin-1"=
) open(u, "w")
 Martin> <open file 'abc=E4=FC=DF.txt', mode 'w' at 0x8173e88>
 Martin> On Unix, your best bet for file names is to trust the user'=
s
 Martin> locale settings. If you do that, open will accept Unicode
 Martin> objects.
 Martin> What is your locale?
The above setlocale call prints
 'LC_CTYPE=3Den_US;LC_NUMERIC=3Den_US;LC_TIME=3Den_US;LC_COLLATE=3De=
n_US;LC_MONETARY=3Den_US;LC_MESSAGES=3Den_US;LC_PAPER=3Den;LC_NAME=3Den=
;LC_ADDRESS=3Den;LC_TELEPHONE=3Den;LC_MEASUREMENT=3Den;LC_IDENTIFICATIO=
N=3Den'
I can't get to the machines in Austria right now to see how their local=
es
are set, though I suspect they haven't fiddled their LC_* environment,
because they are having the problems I described.
 >> Is that the correct approach? Apparently Python's file object
 >> doesn't do this under the covers. Should it?
 Martin> No. There is no established convention, on Unix, how to do
 Martin> non-ASCII file names. If anything, following the user's loc=
ale
 Martin> setting is the most reasonable thing to do; this should be =
in
 Martin> synch of how the user's terminal displays characters. The P=
ython
 Martin> installations' default encoding is almost useless, and shou=
ldn't
 Martin> be changed.
 Martin> On Windows, things are much better, since there a notion of=
 Martin> Unicode file names in the system.
This suggests to me that the Python docs need some introductory materia=
l on
this topic. It appears to me that there are two people in the Python
community who live and breathe this stuff are you, Martin, and Marc-And=
r=E9.
For most of the rest of us, especially if we've never conciously writte=
n
code for consumption outside an ascii environment, the whole thing just=
looks like a quagmire.
Skip

AltStyle によって変換されたページ (->オリジナル) /