homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Calling Py_DecodeLocale() before _PyPreConfig_Write() can produce mojibake
Type: Stage: resolved
Components: Interpreter Core Versions: Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: methane, ncoghlan, vstinner
Priority: normal Keywords: patch

Created on 2019年03月06日 00:53 by vstinner, last changed 2022年04月11日 14:59 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 12589 merged vstinner, 2019年03月27日 15:04
Messages (7)
msg337252 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019年03月06日 00:53
Calling Py_DecodeLocale() before Py_Initialize() or _Py_InitializeCore() is broken since Python 3.7 if the C locale coercion (PEP 538) or UTF-8 mode (PEP 540) changes the encoding in the middle of _Py_InitializeCore().
I added a new phase to the Python initialization in bpo-36142, a new _PyPreConfig structure, which can be used to fix this mojibake issue.
The code for embedding Python should look like:
---
_Py_PreInitialize();
_PyCoreConfig config;
config.home = Py_DecodeLocale("/path/to/home");
_PyInitError err = _Py_InitializeFromConfig(&config);
if (_Py_INIT_FAILED(err)) {
 _PyCoreConfig_Clear(&config);
 _Py_ExitInitError(err);
}
/* use Python here */
Py_Finalize();
_PyCoreConfig_Clear(&config);
---
Except that there is no _Py_PreInitialize() function yet :-)
msg337260 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019年03月06日 01:22
The "vim" editor embeds Python. It sets the Python home by calling Py_SetPythonHome() with the following code:
---
	 size_t len = mbstowcs(NULL, (char *)p_py3home, 0) + 1;
	 /* The string must not change later, make a copy in static memory. */
	 py_home_buf = (wchar_t *)alloc(len * sizeof(wchar_t));
	 if (py_home_buf != NULL && mbstowcs(
			 py_home_buf, (char *)p_py3home, len) != (size_t)-1)
		Py_SetPythonHome(py_home_buf);
---
ref: https://github.com/vim/vim/blob/14816ad6e58336773443f5ee2e4aa9e384af65d2/src/if_python3.c#L874-L887
mbstowcs() uses the current LC_CTYPE locale. Python can select a different filesystem encoding than the LC_CTYPE encoding depending on PEP 538 and PEP 540. So encoding back the Python home to bytes to access to files on the filesystem can fail because of mojibake.
The code should by written like (pseudo-code):
---
_Py_PreInitialize();
_PyCoreConfig config;
config.home = Py_DecodeLocale(p_py3home);
if (config.home == NULL) { /* ERROR */ }
_PyInitError err = _Py_InitializeFromConfig(&config);
if (_Py_INIT_FAILED(err)) {
 _PyCoreConfig_Clear(&config);
 _Py_ExitInitError(err);
}
---
The vim case has been discussed at:
https://discuss.python.org/t/adding-char-based-apis-for-unix/916/8 
msg337261 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019年03月06日 01:22
By the way, according to Nick Coghlan (author of the PEP 538), Py_Initialize() and Py_Main() called from an application embedding Python should not coerce the C locale.
msg337281 - (view) Author: Alyssa Coghlan (ncoghlan) * (Python committer) Date: 2019年03月06日 05:40
They weren't *intended* to change it, and didn't in the original implementation of the PEP, but they do in the as-shipped Python 3.7 implementation, and I abandoned my attempts to revert to the as-designed behaviour as impractical given the other changes made for PEP 540. So that's a behavior we're stuck with now: they both have global side effects on the locale of the calling process.
msg337915 - (view) Author: Alyssa Coghlan (ncoghlan) * (Python committer) Date: 2019年03月14日 14:03
Victor and I were discussing the appropriate behaviour for the "What do we do if _Py_PreInitialize() hasn't been called?" case, and Victor pointed out that the potential for mojibake provides a solid rationale for going back to the Python 3.6 behaviour: disable both C locale coercion *and* UTF-8 mode, and instead leave locale management to the embedding application.
That would also get us back to the originally intended behaviour of PEP 538, where it only happens by default in the CLI app, not the runtime support library. Back when I decided that doing that would be too complicated, the _Py_PreInitialize API didn't exist yet, and it didn't occur to me that the mojibake problem would affect UTF-8 mode as well.
msg338977 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019年03月27日 17:29
New changeset d929f1838a8fba881ff0148b7fc31f6265703e3d by Victor Stinner in branch 'master':
bpo-36443: Disable C locale coercion and UTF-8 Mode by default (GH-12589)
https://github.com/python/cpython/commit/d929f1838a8fba881ff0148b7fc31f6265703e3d
msg338979 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019年03月27日 17:29
My commit disabled C locale coercion and UTF-8 Mode by default when Python is embedded which fix this issue in Python 3.8. I close the issue.
History
Date User Action Args
2022年04月11日 14:59:12adminsetgithub: 80383
2019年03月27日 17:29:58vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg338979

stage: patch review -> resolved
2019年03月27日 17:29:06vstinnersetmessages: + msg338977
2019年03月27日 15:04:47vstinnersetkeywords: + patch
stage: patch review
pull_requests: + pull_request12532
2019年03月14日 14:03:42ncoghlansetmessages: + msg337915
2019年03月06日 05:40:38ncoghlansetmessages: + msg337281
2019年03月06日 01:22:48vstinnersetmessages: + msg337261
2019年03月06日 01:22:29vstinnersetnosy: + methane
messages: + msg337260
2019年03月06日 00:53:52vstinnercreate

AltStyle によって変換されたページ (->オリジナル) /