homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Change sys.getfilesystemencoding() on Windows to UTF-8
Type: behavior Stage: resolved
Components: Windows Versions: Python 3.7, Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: steve.dower Nosy List: Decorater, brett.cannon, jkloth, mark.dickinson, ncoghlan, ned.deily, paul.moore, python-dev, steve.dower, tim.golden, vstinner, yan12125, zach.ware
Priority: release blocker Keywords: patch

Created on 2016年08月17日 03:49 by steve.dower, last changed 2022年04月11日 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
fsencoding.diff steve.dower, 2016年08月17日 03:49
fsencoding.diff vstinner, 2016年08月17日 10:52 review
test_cmd_line_unicode.py ncoghlan, 2016年09月05日 07:05 Possible new test case for command line Unicode handling
27781_1.patch steve.dower, 2016年09月07日 00:26 review
osx_failed_compile.txt mark.dickinson, 2016年09月09日 08:38
Messages (24)
msg272899 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016年08月17日 03:49
I've attached my first pass at a patch to change the file system encoding on Windows to UTF-8 and remove use of the *A APIs.
It would be trivial to change the encoding from UTF-8 back to CP_ACP and change the error mode if that's what we decide is better, but my vote is strongly for an encoding that never drops characters when converted from UTF-16.
Discussion is still ongoing on python-ideas, so let's argue about yes/no and utf-8/mbcs there and just discuss the patch here.
msg272900 - (view) Author: Decorater (Decorater) * Date: 2016年08月17日 04:48
I personally hate ansi myself so +1 to UTF-8/UTF-16.
msg272916 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016年08月17日 10:32
Would it be acceptable for you to add a new option to switch to UTF-8 in Python 3.6, and discuss later if it's ok to enable it by default?
In the python-ideas threed, you wrote that Windows allow surrogate characters in filenames, but not the UTF-8/strict Python codec. Would it make sense to use UTF-8/surrogatepass codec to avoid any unicode error?
msg272917 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016年08月17日 10:52
Steve Dower: Please don't use git format for diff, or the bug tracker is unable to create reviews. I regenerated the patch.
By the way, you introduced a bug in posix_do_stat(): you added a new "else" in the !MS_WINDOWS path which leads to a compilation error. I fixed it.
msg272935 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016年08月17日 13:19
Thanks for the regen. I don't think git format is the problem as most of my patches are fine, it's probably because it was in a patch queue and so the parent isn't actually a known commit. I haven't tested whether this works without my other console patches but I think it should.
Is there a surrogatepass option? If so, I'll definitely use that, as that'll fix the one remaining edge case.
I suspect we'll have to go to Guido to get a ruling on the default, but I'll add an environment variable to switch.
msg272949 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016年08月17日 14:20
> Is there a surrogatepass option?
I'm talking about error handlers of Python codecs: text.encode('utf8',
'surrogatepass')
msg272950 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016年08月17日 14:22
> I suspect we'll have to go to Guido to get a ruling on the default, but I'll add an environment variable to switch.
If you go in this direction, I would like to follow you for the
UNIX/BSD side to make the switch portable. I was thinking about "-X
utf8" which avoids to change the command line parser.
If we agree on a plan, I would like to write it down as a PEP since I
expect a lot of complains and questions which I would prefer to only
answer once (see for example the length of your thread on python-ideas
where each people repeated the same things multiple times ;-))
msg272961 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016年08月17日 15:40
By portable, do you mean not using an environment variable?
Command line parsing is potentially affected by this on Windows - I'd have to look deeper - as command lines are provided as UTF-16. But we may not ever expose them as bytes.
I don't even know that this matters on the UNIX/BSD side as the file system encoding provided there is correct, no? It's just Windows where the file system encoding used for bytes doesn't match what the file system actually uses.
I was afraid a PEP would be necessary out of this, but I want to see how the python-dev discussion goes first.
msg272962 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016年08月17日 15:44
Steve Dower added the comment:
> By portable, do you mean not using an environment variable?
I mean that "python3 -X utf8" should force sys.getfilesystemencoding()
to UTF-8 on UNIX/BSD, it would ignore the current locale setting.
msg272963 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016年08月17日 15:49
Ah I see, if we end up sticking with MBCS and offering a switch to enable UTF-8. In that case, we'll definitely ensure the flag is the same (but I'm hopeful we will just make the reliable behavior on Windows the default, so it won't matter).
msg274392 - (view) Author: Alyssa Coghlan (ncoghlan) * (Python committer) Date: 2016年09月05日 07:05
I belatedly remembered I've had this new test case hanging around for a while, and never got around to getting it into shape for inclusion in the standard library.
With the prospect of reasonable cross-platform consistency in this area, it could be a good thing to add as part of this PEP.
msg274691 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016年09月07日 00:26
Also see PEP 529 for the latest updates there.
This is likely to be accepted as experimental for 3.6.0b1-3, and we'll commit to either the new default or a compatible default for b4.
msg274887 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016年09月07日 20:51
PEP 529 has been accepted, so this really needs a review now. But since it's experimental and all the tests pass, I'll be committing it shortly anyway and will be tidying up issues during beta.
msg274910 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016年09月07日 23:28
One minor change - I removed the unused definition of Py_FileSystemDefaultDecodeErrors.
msg275063 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016年09月08日 16:43
Thanks for that review, Eryk, but I'm going to defer those to other issues (specifically issue27998 for scandir and we should file a new issue for the symlink concerns).
I've got some more doc updates to do though, and then I'll check in if there are no other concerns.
msg275075 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016年09月08日 17:35
New changeset e20c7d8a8187 by Steve Dower in branch 'default':
Issue #27781: Change file system encoding on Windows to UTF-8 (PEP 529)
https://hg.python.org/cpython/rev/e20c7d8a8187 
msg275078 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016年09月08日 17:37
This is pushed now - let the bug fixing begin :)
msg275095 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016年09月08日 18:11
New changeset faca0730270b by Steve Dower in branch 'default':
Fixes tests broken by issue #27781.
https://hg.python.org/cpython/rev/faca0730270b 
msg275289 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2016年09月09日 08:38
It looks as though this change might have broken the compile on OS X. On my OS X 10.9 machine, building from a clean Git checkout of the master branch fails; the tail of the failed build looks like this:
./python.exe -E -S -m sysconfig --generate-posix-vars ;\
	if test $? -ne 0 ; then \
		echo "generate-posix-vars failed" ; \
		rm -f ./pybuilddir.txt ; \
		exit 1 ; \
	fi
Fatal Python error: Py_Initialize: unable to load the file system codec
Traceback (most recent call last):
 File "<frozen importlib._bootstrap>", line 962, in _find_and_load
 File "<frozen importlib._bootstrap>", line 951, in _find_and_load_unlocked
 File "<frozen importlib._bootstrap>", line 656, in _load_unlocked
 File "<frozen importlib._bootstrap_external>", line 668, in exec_module
 File "<frozen importlib._bootstrap_external>", line 782, in get_code
 File "<frozen importlib._bootstrap_external>", line 842, in _cache_bytecode
 File "<frozen importlib._bootstrap_external>", line 867, in set_data
 File "<frozen importlib._bootstrap_external>", line 117, in _write_atomic
ValueError: negative file descriptor
/bin/sh: line 1: 35829 Abort trap: 6 ./python.exe -E -S -m sysconfig --generate-posix-vars
generate-posix-vars failed
make: *** [pybuilddir.txt] Error 1
Full build output attached.
msg275290 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2016年09月09日 08:55
It looks as though this change in posixmodule.c is the cause:
 #ifdef MS_WINDOWS
- if (path->wide)
- fd = _wopen(path->wide, flags, mode);
- else
+ fd = _wopen(path->wide, flags, mode);
 #endif
 #ifdef HAVE_OPENAT
 if (dir_fd != DEFAULT_DIR_FD)
 fd = openat(dir_fd, path->narrow, flags, mode);
 else
-#endif
 fd = open(path->narrow, flags, mode);
+#endif
The move of the final #endif means that `fd` is not defined on OS X. If I move the #endif back again, the compile succeeds.
msg275326 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016年09月09日 16:03
New changeset 801634d3c105 by Steve Dower in branch 'default':
Issue #27781: Fixes uninitialized fd when !MS_WINDOWS and !HAVE_OPENAT
https://hg.python.org/cpython/rev/801634d3c105 
msg275331 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2016年09月09日 16:20
That seems to have done the trick. Thanks!
msg279847 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016年11月01日 03:59
Before 3.6.0 beta 4 I need to make this change permanent. From memory, it's just an exception message that needs changing (and PEP 529 becomes final), but I'll review the changeset to be sure.
msg280186 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016年11月07日 03:36
New changeset b26c8104e54f by Steve Dower in branch '3.6':
Closes #27781: Removes special cases for the experimental aspect of PEP 529
https://hg.python.org/cpython/rev/b26c8104e54f
New changeset b8233c779ff7 by Steve Dower in branch 'default':
Closes #27781: Removes special cases for the experimental aspect of PEP 529
https://hg.python.org/cpython/rev/b8233c779ff7 
History
Date User Action Args
2022年04月11日 14:58:34adminsetgithub: 71968
2016年11月07日 03:36:05python-devsetstatus: open -> closed
resolution: fixed
messages: + msg280186

stage: needs patch -> resolved
2016年11月01日 03:59:07steve.dowersetpriority: normal -> release blocker
versions: + Python 3.7
nosy: + ned.deily

messages: + msg279847

stage: commit review -> needs patch
2016年09月16日 13:02:20vstinnerunlinkissue28180 superseder
2016年09月16日 11:22:16abarrylinkissue28180 superseder
2016年09月09日 16:20:19mark.dickinsonsetmessages: + msg275331
2016年09月09日 16:03:28python-devsetmessages: + msg275326
2016年09月09日 08:55:59mark.dickinsonsetmessages: + msg275290
2016年09月09日 08:38:37mark.dickinsonsetfiles: + osx_failed_compile.txt
nosy: + mark.dickinson
messages: + msg275289

2016年09月08日 18:11:37python-devsetmessages: + msg275095
2016年09月08日 17:37:20steve.dowersetmessages: + msg275078
stage: patch review -> commit review
2016年09月08日 17:35:34python-devsetnosy: + python-dev
messages: + msg275075
2016年09月08日 16:43:48steve.dowersetmessages: + msg275063
2016年09月08日 02:08:48ncoghlanlinkissue22555 dependencies
2016年09月07日 23:28:36steve.dowersetmessages: + msg274910
2016年09月07日 20:51:36steve.dowersetmessages: + msg274887
2016年09月07日 17:31:03steve.dowerlinkissue27998 dependencies
2016年09月07日 17:31:03steve.dowerunlinkissue27998 superseder
2016年09月07日 09:07:41eryksunlinkissue27998 superseder
2016年09月07日 00:27:04steve.dowersetfiles: + 27781_1.patch

messages: + msg274691
2016年09月05日 07:05:44ncoghlansetfiles: + test_cmd_line_unicode.py
nosy: + ncoghlan
messages: + msg274392

2016年08月17日 16:15:03brett.cannonsetnosy: + brett.cannon
2016年08月17日 15:49:28steve.dowersetmessages: + msg272963
2016年08月17日 15:44:11vstinnersetmessages: + msg272962
2016年08月17日 15:40:50steve.dowersetmessages: + msg272961
2016年08月17日 14:22:54vstinnersetmessages: + msg272950
2016年08月17日 14:20:52vstinnersetmessages: + msg272949
2016年08月17日 13:19:28steve.dowersetmessages: + msg272935
2016年08月17日 13:10:06yan12125setnosy: + yan12125
2016年08月17日 10:52:39vstinnersetfiles: + fsencoding.diff

messages: + msg272917
2016年08月17日 10:32:19vstinnersetnosy: + vstinner
messages: + msg272916
2016年08月17日 08:58:46jklothsetnosy: + jkloth
2016年08月17日 04:48:10Decoratersetnosy: + Decorater
messages: + msg272900
2016年08月17日 03:49:40steve.dowercreate

AltStyle によって変換されたページ (->オリジナル) /