Issue 39465: [subinterpreters] Design a subinterpreter friendly alternative to _Py_IDENTIFIER

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/83646

classification

Title:	[subinterpreters] Design a subinterpreter friendly alternative to _Py_IDENTIFIER
Type:	enhancement	Stage:	resolved
Components:	Subinterpreters	Versions:	Python 3.10

process

Dependencies:	Superseder:
Status:	closed	Resolution:	fixed
Assigned To:	Nosy List:	eric.snow, ncoghlan, petr.viktorin, shihai1991, vstinner
Priority:	normal	Keywords:	patch

Created on 2020年01月27日 14:28 by ncoghlan, last changed 2022年04月11日 14:59 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
bench.py	vstinner, 2020年05月12日 01:11
bench.patch	vstinner, 2020年05月12日 01:12

Pull Requests
URL	Status	Linked	Edit
PR 20043	merged	vstinner, 2020年05月11日 23:18
PR 20048	closed	vstinner, 2020年05月12日 00:56
PR 20058	merged	vstinner, 2020年05月12日 17:38
PR 20078	merged	vstinner, 2020年05月13日 22:36
PR 20390	closed	vstinner, 2020年05月25日 16:43
PR 20595	merged	vstinner, 2020年06月02日 12:14
PR 20766	merged	vstinner, 2020年06月09日 18:26
PR 20767	closed	vstinner, 2020年06月09日 21:58
PR 20788	merged	vstinner, 2020年06月10日 17:33

Messages (17)
msg360766 - (view)	Author: Alyssa Coghlan (ncoghlan) * (Python committer)	Date: 2020年01月27日 14:28
Both https://github.com/python/cpython/pull/18066 (collections module) and https://github.com/python/cpython/pull/18032 (asyncio module) ran into the problem where porting them to multi-phase initialisation involves replacing their usage of the `_Py_IDENTIFIER` macro with some other mechanism. When _posixsubprocess was ported, the replacement was a relatively ad hoc combination of string interning and the interpreter-managed module-specific state: https://github.com/python/cpython/commit/5a7d2e11aaea2dd32878dc5c6b1aae8caf56cb44 I'm wondering if we may able to devise a comparable struct-field based system that replaces the `_Py_IDENTIFIER` local static variable declaration macro and the `Py_Id_<name>` lookup convention with a combination like (using the posix subprocess module conversion as an example): // Identifier usage declaration (replaces _Py_IDENTIFIER) _Py_USE_CACHED_IDENTIFIER(_posixsubprocessstate(m), disable); // Identifier usage remains unchanged, but uses a regular local variable // rather than the static variable declared by _Py_IDENTIFIER result = _PyObject_CallMethodIdNoArgs(gc_module, &PyId_disable); And then the following additional state management macros would be needed to handle the string interning and reference counting: // Module state struct declaration typedef struct { // This would declare an initialised array of _Py_Identifier structs // under a name like __cached_identifiers__. The end of the array // would be indicated by a strict with "value" set to NULL. _Py_START_CACHED_IDENTIFIERS; _Py_CACHED_IDENTIFIER(disable); _Py_CACHED_IDENTIFIER(enable); _Py_CACHED_IDENTIFIER(isenabled); _Py_END_CACHED_IDENTIFIERS; ); } _posixsubprocessstate; // Module tp_traverse implementation _Py_VISIT_CACHED_IDENTIFIERS(_posixsubprocessstate(m)); // Module tp_clear implementation (also called by tp_free) _Py_CLEAR_CACHED_IDENTIFIERS(_posixsubprocessstate(m)); With the requirement to declare usage of the cached identifiers, they could be lazily initialized the same way the existing static variables are (even re-using the same struct declaration). Note: this is just a draft of one possible design, the intent of this issue is to highlight the fact that this issue has now come up multiple times, and it would be good to have a standard answer available.
msg361041 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2020年01月30日 11:35
Once I discussed with Eric Snow during a core developer sprint: _Py_IDENTIFIER() should use an "interpreter local storage" for identifiers values. _Py_IDENTIFIER() would only be a "key" and _PyUnicode_FromId() would store the value somewhere in a hash table stored in PyInterpreterState. Something similar to the TSS API: * PyThread_create_key() * PyThread_delete_key_value() * PyThread_set_key_value() * PyThread_get_key_value() But per interpreter, rather than being per thread. The key can be simply the variable address in memory. It only has to be unique in the process.
msg361042 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2020年01月30日 11:41
> Both https://github.com/python/cpython/pull/18066 (collections module) and https://github.com/python/cpython/pull/18032 (asyncio module) ran into the problem where porting them to multi-phase initialisation involves replacing their usage of the `_Py_IDENTIFIER` macro with some other mechanism. What is the problem between _Py_IDENTIFIER and multi-phase initialisation modules? If both are incompatible, we may need a different but similar API: values would be stored in a hash table per module object. The hash table can be stored in the module object directly, or it can be store in a second hash table (module => hash table). If we want a unified API, maybe we can use module=NULL (or any other marker) for "global" identifiers (not specific to a module).
msg361044 - (view)	Author: Petr Viktorin (petr.viktorin) * (Python committer)	Date: 2020年01月30日 11:53
> What is the problem between _Py_IDENTIFIER and multi-phase initialisation modules? AFAIK there is no problem now, except possibly a race condition when initializing the identifiers. It seems it's too easy to conflate porting to multi-phase initialization and getting rid of static state. The problem will come with per-interpreter reference counting, or when the `str` class is no longer shared across all interpreters. For that, we'll need either per-interpreter identifiers, or solve the issue in another way.
msg361045 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2020年01月30日 11:54
> AFAIK there is no problem now, except possibly a race condition when initializing the identifiers. The GIL avoids any risk of race condition, no?
msg361533 - (view)	Author: Hai Shi (shihai1991) * (Python triager)	Date: 2020年02月07日 02:59
> The GIL avoids any risk of race condition, no? Looks like the GIL would affect performance more or less? >_Py_IDENTIFIER() would only be a "key" and _PyUnicode_FromId() would >store the value somewhere in a hash table stored in PyInterpreterState. +1. IMHO, for those two cases, the simplest idea is move IDENTIFIER to moduleState which would increase more memory usage than InterpreterState.
msg361627 - (view)	Author: Alyssa Coghlan (ncoghlan) * (Python committer)	Date: 2020年02月08日 12:52
As Petr notes, as long as all subinterpreters share the GIL, and share str instances, then the existing _Py_IDENTIFIER mechanism will work fine for both single phase and multi-phase initialisation. However, that constraint also goes the other way: as long as we have modules that use the existing _Py_IDENTIFIER mechanism, then subinterpreters must share str instances, and hence must share the GIL. Hence the "enhancement" classification: there's nothing broken right now, but if we're ever going to achieve the design goal of using subinterpreters to exploit multiple CPU cores without the overhead of running multiple full interpreter processes, we're going to need to design a different way of handling this. Something to keep in mind with `_Py_IDENTIFIER` and any replacement API: the baseline for performance comparisons is https://docs.python.org/3/c-api/unicode.html#c.PyUnicode_InternFromString The reason multi-phase initialisation makes this more complicated is that it means we can't use the memory addresses of C process globals as unique identifiers any more, since more than one module object may be created from the same C shared library. However, if we assume we've moved to per-module state storage (to get unique memory addresses back), then we can largely re-use the existing `_Py_IDENTIFIER` machinery to make the lookup as fast as possible, while still avoiding conflicts between subinterpreters.
msg368682 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2020年05月11日 23:43
New changeset 4804b5b3df82e7892ca0550b02f902bcfc16bb48 by Victor Stinner in branch 'master': bpo-39465: Don't access directly _Py_Identifier members (GH-20043) https://github.com/python/cpython/commit/4804b5b3df82e7892ca0550b02f902bcfc16bb48
msg368688 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2020年05月12日 00:27
I created bpo-40602: "Move Modules/hashtable.h to Include/internal/pycore_hashtable.h".
msg368691 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2020年05月12日 01:12
Attached bench.py: Micro-benchmark on _PyUnicode_FromId(). It requires attached bench.patch being applied.
msg368805 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2020年05月13日 23:11
New changeset d6fb53fe42d83a10f1372dd92ffaa6a01d2feffb by Victor Stinner in branch 'master': bpo-39465: Remove _PyUnicode_ClearStaticStrings() from C API (GH-20078) https://github.com/python/cpython/commit/d6fb53fe42d83a10f1372dd92ffaa6a01d2feffb
msg370607 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2020年06月02日 12:40
New changeset 297257f7bc198e2dc8e0866b539c73ff1a5cc588 by Victor Stinner in branch 'master': bpo-39465: Cleanup _PyUnicode_FromId() code (GH-20595) https://github.com/python/cpython/commit/297257f7bc198e2dc8e0866b539c73ff1a5cc588
msg371229 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2020年06月10日 18:08
New changeset 1bcc32f0620d2e99649a6d423284d9496b7b3548 by Victor Stinner in branch 'master': bpo-39465: Use _PyInterpreterState_GET() (GH-20788) https://github.com/python/cpython/commit/1bcc32f0620d2e99649a6d423284d9496b7b3548
msg383629 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2020年12月23日 02:41
New changeset 52a327c1cbb86c7f2f5c460645889b23615261bf by Victor Stinner in branch 'master': bpo-39465: Add pycore_atomic_funcs.h header (GH-20766) https://github.com/python/cpython/commit/52a327c1cbb86c7f2f5c460645889b23615261bf
msg383781 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2020年12月25日 23:41
New changeset ba3d67c2fb04a7842741b1b6da5d67f22c579f33 by Victor Stinner in branch 'master': bpo-39465: Fix _PyUnicode_FromId() for subinterpreters (GH-20058) https://github.com/python/cpython/commit/ba3d67c2fb04a7842741b1b6da5d67f22c579f33
msg383783 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2020年12月25日 23:49
Ok, it should now be fixed. I close the issue. See PR 20085 "Per-interpreter interned strings" of bpo-40521 for the follow-up.
msg407954 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2021年12月07日 17:28
This change introduced a subtle regression: bpo-46006 "[subinterpreter] _PyUnicode_EqualToASCIIId() issue with subinterpreters".

History
Date	User	Action	Args
2022年04月11日 14:59:25	admin	set	github: 83646
2021年12月07日 17:28:23	vstinner	set	messages: + msg407954
2020年12月25日 23:49:32	vstinner	set	status: open -> closed stage: patch review -> resolved resolution: fixed versions: + Python 3.10
2020年12月25日 23:49:25	vstinner	set	messages: + msg383783
2020年12月25日 23:41:56	vstinner	set	messages: + msg383781
2020年12月23日 02:41:11	vstinner	set	messages: + msg383629
2020年06月10日 18:08:29	vstinner	set	messages: + msg371229
2020年06月10日 17:33:36	vstinner	set	pull_requests: + pull_request19983
2020年06月09日 21:58:27	vstinner	set	pull_requests: + pull_request19966
2020年06月09日 18:26:36	vstinner	set	pull_requests: + pull_request19965
2020年06月02日 12:40:00	vstinner	set	messages: + msg370607
2020年06月02日 12:14:45	vstinner	set	pull_requests: + pull_request19825
2020年05月25日 16:43:00	vstinner	set	pull_requests: + pull_request19653
2020年05月15日 00:38:26	vstinner	set	components: + Subinterpreters title: Design a subinterpreter friendly alternative to _Py_IDENTIFIER -> [subinterpreters] Design a subinterpreter friendly alternative to _Py_IDENTIFIER
2020年05月13日 23:11:58	vstinner	set	messages: + msg368805
2020年05月13日 22:36:41	vstinner	set	pull_requests: + pull_request19384
2020年05月12日 17:38:54	vstinner	set	pull_requests: + pull_request19366
2020年05月12日 01:12:29	vstinner	set	files: + bench.patch messages: + msg368691
2020年05月12日 01:11:46	vstinner	set	files: + bench.py
2020年05月12日 00:56:07	vstinner	set	pull_requests: + pull_request19357
2020年05月12日 00:27:04	vstinner	set	messages: + msg368688
2020年05月11日 23:43:57	vstinner	set	messages: + msg368682
2020年05月11日 23:18:40	vstinner	set	keywords: + patch stage: patch review pull_requests: + pull_request19352
2020年02月08日 12:52:39	ncoghlan	set	messages: + msg361627
2020年02月07日 02:59:23	shihai1991	set	messages: + msg361533
2020年01月30日 11:54:57	vstinner	set	messages: + msg361045
2020年01月30日 11:53:21	petr.viktorin	set	messages: + msg361044
2020年01月30日 11:41:22	vstinner	set	messages: + msg361042
2020年01月30日 11:35:23	vstinner	set	nosy: + vstinner messages: + msg361041
2020年01月27日 14:28:51	ncoghlan	set	type: enhancement
2020年01月27日 14:28:30	ncoghlan	create

homepage