homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Setting the default filesystem-encoding
Type: enhancement Stage:
Components: Documentation Versions: Python 3.3, Python 3.4
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: docs@python Nosy List: deleted250130, docs@python, vstinner
Priority: normal Keywords:

Created on 2013年11月30日 22:29 by deleted250130, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Messages (12)
msg204853 - (view) Author: (deleted250130) Date: 2013年11月30日 22:29
sys.getfilesystemencoding() says for Unix: On Unix, the encoding is the user’s preference according to the result of nl_langinfo(CODESET), or 'utf-8' if nl_langinfo(CODESET) failed.
In my opinion relying on the locale environment is risky since filesystem-encoding != locale. This is especially the case if working on a filesystem from an external media like an external hard disk drive. Operating on multiple media can also result in different filesystem-encodings.
It would be useful if the user can make his own checks and change the default filesystem-encoding if needed.
msg204998 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013年12月02日 11:14
"sys.getfilesystemencoding() says for Unix: On Unix, the encoding is the user’s preference according to the result of nl_langinfo(CODESET), or 'utf-8' if nl_langinfo(CODESET) failed."
Oh, this documentation is wrong since at least Python 3.2: if nl_langinfo(CODESET) fails, Python exits immediatly with a (fatal) error.
There is no (more?) such fallback to "utf-8".
msg205000 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013年12月02日 11:18
I fixed the documentation, thanks for your report!
msg205001 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013年12月02日 11:19
Code in Python 3.4.
initfsencoding():
http://hg.python.org/cpython/file/e3c48bddf621/Python/pythonrun.c#l965
get_locale_encoding():
http://hg.python.org/cpython/file/e3c48bddf621/Python/pythonrun.c#l250 
msg205002 - (view) Author: (deleted250130) Date: 2013年12月02日 11:28
It is nice that you could fixed the documentation due to this report but this was just a sideeffect - so closing this report and moving it to "Documentation" was maybe wrong.
msg205006 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013年12月02日 12:24
(Oops, I specified the wrong issue number in my commits.)
New changeset b231e0c3fd26 by Victor Stinner in branch '3.3':
Issue #19728: Fix sys.getfilesystemencoding() documentation
http://hg.python.org/cpython/rev/b231e0c3fd26
New changeset e3c48bddf621 by Victor Stinner in branch 'default':
(Merge 3.3) Issue #19728: Fix sys.getfilesystemencoding() documentation
http://hg.python.org/cpython/rev/e3c48bddf621 
msg205008 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013年12月02日 12:34
"It is nice that you could fixed the documentation due to this report but this was just a sideeffect - so closing this report and moving it to "Documentation" was maybe wrong."
Oh sorry, I read the issue too quickly, I stopped at the first sentence. I reopen the issue the reply to the other points.
"In my opinion relying on the locale environment is risky since filesystem-encoding != locale. This is especially the case if working on a filesystem from an external media like an external hard disk drive. Operating on multiple media can also result in different filesystem-encodings."
This issue is not specific to Python. If you mount an USB key formated in VFAT with the wrong encoding on Linux, you will get mojibake in your file explorer. Same issue if you connect a network share (ex: NFS) using a different encoding than the server. You can find many other examples (hint: Mac OS X and Unicode normalization).
There is no good compromise here. The only two safe options are:
(A) convert filenames of your filesystem to the same encoding than your computer (there are tools for that, like convmv)
(B) use raw bytes instead of Unicode, Python 3 should accept bytes anywhere that OS data is expected (filenames, command line arguments, environment variables)
All operating systems (except Windows) are now using UTF-8 by default for the locale encoding. So slowly, mojibake issues on filename should become very rare.
"It would be useful if the user can make his own checks and change the default filesystem-encoding if needed."
This idea was already proposed in issue #8622, but it was a big fail. Please read my following email for more information:
https://mail.python.org/pipermail/python-dev/2010-October/104509.html 
msg205058 - (view) Author: (deleted250130) Date: 2013年12月02日 21:49
> This idea was already proposed in issue #8622, but it was a big fail.
Not completely: If your locale is utf-8 and you want to operate on an utf-8 filesystem all is fine. But what if you want then to operate on a ntfs (non-utf-8) partition? As I know there is no way to apply Python-environment variables on the fly with an effect to the interpreter. In my opinion this is the reason why a setter is needed here.
Otherwise the user has to go sure to use .encode() on all filesystem operations. Also he must ensure that .encode() doesn't throw any exception if the code must be robust. And with issue http://bugs.python.org/issue19846 this must likely be done with the content too. This will be really a hell in increasing the number of lines due to exception checking.
Is there a special reason that is against such a setter? The current advantage would be a huge increasing in maintainability of Python scripts who are relying on a high stability.
msg205982 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013年12月12日 21:16
See also the issue #19846.
msg206049 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013年12月13日 10:24
I'm closing this issue as invalid for the same reason than I closed the issue #19846:
http://bugs.python.org/issue19846#msg205675 
msg206113 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013年12月13日 16:41
I created the issue #19977 as a follow up of this one: "Use surrogateescape error handler for sys.stdout on UNIX for the C locale".
msg308563 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017年12月18日 14:32
Follow-up: the PEP 538 (bpo-28180) and PEP 540 (bpo-29240) have been accepted and implemented in Python 3.7. Python 3.7 will now use UTF-8 by default for the POSIX locale, and the encoding can be forced to UTF-8 using -X utf8 option.
History
Date User Action Args
2022年04月11日 14:57:54adminsetgithub: 64046
2017年12月18日 14:32:57vstinnersetmessages: + msg308563
2013年12月13日 16:41:09vstinnersetmessages: + msg206113
2013年12月13日 10:24:56vstinnersetstatus: open -> closed
resolution: not a bug
messages: + msg206049
2013年12月12日 21:16:13vstinnersetmessages: + msg205982
2013年12月02日 21:49:42deleted250130setmessages: + msg205058
2013年12月02日 12:34:18vstinnersetstatus: closed -> open
resolution: fixed -> (no value)
messages: + msg205008
2013年12月02日 12:24:26vstinnersetmessages: + msg205006
2013年12月02日 11:28:38deleted250130setmessages: + msg205002
2013年12月02日 11:19:53vstinnersetmessages: + msg205001
2013年12月02日 11:18:31vstinnersetstatus: open -> closed

assignee: docs@python
components: + Documentation, - IO
versions: + Python 3.4
nosy: + docs@python

messages: + msg205000
resolution: fixed
2013年12月02日 11:14:54vstinnersetnosy: + vstinner
messages: + msg204998
2013年11月30日 22:29:52deleted250130create

AltStyle によって変換されたページ (->オリジナル) /