Message 114352 - Python tracker

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

In-reply-to
Author	vstinner
Recipients	amaury.forgeotdarc, vstinner
Date	2010年08月19日.12:16:46
SpamBayes Score	6.580767e-06
Marked as misclassified	No
Message-id	<1282220208.81.0.192506917616.issue9630@psf.upfronthosting.co.za>

Content
> since the modules were successfully imported, surely it means that > their filenames where correctly computed and encoded? So why is the > __filename__ attribute wrong? Python starts with 'utf-8' encoding. If the new encoding is "smaller" (unable to encode as much characters as utf-8), PyUnicode_EncodeFS() and os.fsencode() will raise UnicodeEncodeError. Eg. your Python setup is installed in a directory called b'py3k\xc3\xa9' and your locale is C (ascii encoding). At startup, the directory name is decoded to 'py3ké' (using the defautlt encoding, utf-8). initfsencoding() sets the encoding to ascii: 'py3ké' cannot be encoded to the filesystem encoding (ascii) anymore. -- If we set the default filesystem encoding to ascii (#8725), it will work but the filenames will be full of surrogates characters. Eg. you Python setup is installed in b'py3k\xc3\xa9' and your locale encoding is utf-8: b'py3k\xc3\xa9' will be decoded to 'py3k\udcc3\udca9' and leaved unchanged by initfsencoding(). Surrogates characters are not pratical: you have to escape them to display them. Print a filename with surrogates in a terminal raise a UnicodeEncodeError (even with utf-8 encoding).

Content

> since the modules were successfully imported, surely it means that
> their filenames where correctly computed and encoded? So why is the
> __filename__ attribute wrong?
Python starts with 'utf-8' encoding. If the new encoding is "smaller" (unable to encode as much characters as utf-8), PyUnicode_EncodeFS() and os.fsencode() will raise UnicodeEncodeError.
Eg. your Python setup is installed in a directory called b'py3k\xc3\xa9' and your locale is C (ascii encoding). At startup, the directory name is decoded to 'py3ké' (using the defautlt encoding, utf-8). initfsencoding() sets the encoding to ascii: 'py3ké' cannot be encoded to the filesystem encoding (ascii) anymore.
--
If we set the default filesystem encoding to ascii (#8725), it will work but the filenames will be full of surrogates characters. Eg. you Python setup is installed in b'py3k\xc3\xa9' and your locale encoding is utf-8: b'py3k\xc3\xa9' will be decoded to 'py3k\udcc3\udca9' and leaved unchanged by initfsencoding(). Surrogates characters are not pratical: you have to escape them to display them. Print a filename with surrogates in a terminal raise a UnicodeEncodeError (even with utf-8 encoding).

History
Date	User	Action	Args
2010年08月19日 12:16:48	vstinner	set	recipients: + vstinner, amaury.forgeotdarc
2010年08月19日 12:16:48	vstinner	set	messageid: <1282220208.81.0.192506917616.issue9630@psf.upfronthosting.co.za>
2010年08月19日 12:16:47	vstinner	link	issue9630 messages
2010年08月19日 12:16:46	vstinner	create

homepage