This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2021年12月07日 17:12 by vstinner, last changed 2022年04月11日 14:59 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| reproducer.c | vstinner, 2021年12月07日 17:12 | |||
| cmp_interned_strings.patch | vstinner, 2021年12月10日 12:35 | |||
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 30123 | closed | vstinner, 2021年12月15日 14:44 | |
| PR 30131 | closed | eric.snow, 2021年12月16日 01:10 | |
| PR 30422 | merged | vstinner, 2022年01月05日 16:27 | |
| PR 30425 | merged | vstinner, 2022年01月06日 07:59 | |
| PR 30433 | closed | vstinner, 2022年01月06日 14:30 | |
| Messages (29) | |||
|---|---|---|---|
| msg407950 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2021年12月07日 17:12 | |
_PyUnicode_EqualToASCIIId() seems to be incompatible with subinterpreter: it makes the assumption that if direct pointer comparison fails and the string is interned, the two strings are not equal. -- super_init_without_args() of Objects/typeobject.c calls _PyUnicode_EqualToASCIIId(name, &PyId___class__) to test if the Unicode string 'name' is equal to "__class__". int _PyUnicode_EqualToASCIIId(PyObject *left, _Py_Identifier *right) { right_uni = _PyUnicode_FromId(right); ... if (left == right_uni) return 1; if (PyUnicode_CHECK_INTERNED(left)) return 0; ... return unicode_compare_eq(left, right_uni); } _PyUnicode_EqualToASCIIId() makes the assumption that left and right are not equal if left and _PyUnicode_FromId(right) pointers are not equal and left is an interned string. In the reproducer, left object is abc.ABCMeta.__new__.__code__.co_freevars[0]. Depending on how the stdlib abc.py file was loaded (in the main interpreter and in the subinterpreter), __code__.co_freevars[0] may or may not be an interned string. If __code__.co_freevars[0] is an interned string, _PyUnicode_EqualToASCIIId() fails in a subinterpreter if the direct pointer comparison fails (if left and right_uni pointers are not equal). -- Reproducer from: https://github.com/ninia/jep/issues/358#issuecomment-988090696 * Build Python 3.10 with "./configure --enable-shared --prefix /opt/py310" and install it. * Download attached reproducer.c. * Build the reproducer with: gcc -o reproducer reproducer.c $(/opt/py310/bin/python3.10-config --embed --cflags --ldflags) * Remove all stdlib .pyc files: find /opt/py310 -type d -name __pycache__|xargs rm -rf * Run the reproducer with: LD_LIBRARY_PATH=/opt/py310/lib ./reproducer Output: --- Before creating sub interpreter Traceback (most recent call last): File "/opt/py310/lib/python3.10/io.py", line 52, in <module> File "/opt/py310/lib/python3.10/abc.py", line 184, in <module> File "/opt/py310/lib/python3.10/abc.py", line 106, in __new__ RuntimeError: super(): __class__ cell not found Fatal Python error: _PyThreadState_Delete: tstate 0x7f9f2001c710 is still current Python runtime state: initialized Current thread 0x00007f9f27c99640 (most recent call first): <no Python frame> Abandon (core dumped) --- py-bt command in gdb: --- (gdb) py-bt Traceback (most recent call first): File "/opt/py310/lib/python3.10/abc.py", line 106, in __new__ cls = super().__new__(mcls, name, bases, namespace, **kwargs) <built-in method __build_class__ of module object at remote 0x7fffea0b4cc0> File "/opt/py310/lib/python3.10/abc.py", line 184, in <module> class ABC(metaclass=ABCMeta): <built-in method exec of module object at remote 0x7fffea0b4cc0> File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed File "<frozen importlib._bootstrap_external>", line 883, in exec_module File "<frozen importlib._bootstrap>", line 688, in _load_unlocked File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 1027, in _find_and_load File "/opt/py310/lib/python3.10/io.py", line 52, in <module> import abc <built-in method exec of module object at remote 0x7fffea0b4cc0> File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed File "<frozen importlib._bootstrap_external>", line 883, in exec_module File "<frozen importlib._bootstrap>", line 688, in _load_unlocked File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 1027, in _find_and_load <built-in method __import__ of module object at remote 0x7fffea0b4cc0> --- |
|||
| msg407951 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2021年12月07日 17:19 | |
In Python 3.9, the code works because the _Py_IDENTIFIER() API shares Python Unicode objects between all interpreters. _PyUnicode_FromId() was modified to be per-interpreter in bpo-39465 by: New changeset ba3d67c2fb04a7842741b1b6da5d67f22c579f33 by Victor Stinner in branch 'master': bpo-39465: Fix _PyUnicode_FromId() for subinterpreters (GH-20058) https://github.com/python/cpython/commit/ba3d67c2fb04a7842741b1b6da5d67f22c579f33 |
|||
| msg407952 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2021年12月07日 17:21 | |
Serhiy: Do you recall the idea of the PyUnicode_CHECK_INTERNED() optimization? The PyUnicode_CHECK_INTERNED() test is as old as the _PyUnicode_EqualToASCIIId() function. commit f5894dd646f5e39918377b37b8c8694cebdca103 Author: Serhiy Storchaka <storchaka@gmail.com> Date: Wed Nov 16 15:40:39 2016 +0200 Issue #28701: Replace _PyUnicode_CompareWithId with _PyUnicode_EqualToASCIIId. The latter function is more readable, faster and doesn't raise exceptions. Based on patch by Xiang Zhang. |
|||
| msg407953 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2021年12月07日 17:27 | |
> Depending on how the stdlib abc.py file was loaded (in the main interpreter and in the subinterpreter), __code__.co_freevars[0] may or may not be an interned string. When the bug occurs, I see that the Python stdlib abc.py file is loaded twice: the main interpreter builds a code object, and then subinterpreter builds its own code object: same content, but different Python object (at different memory addresses so inequal pointers!). I modified reproducer.c to add "Py_VerboseFlag = 1;" before the Py_Initialize() call. Truncated output: --- ... # code object from /opt/py310/lib/python3.10/abc.py ... # code object from /opt/py310/lib/python3.10/abc.py Traceback (most recent call last): File "<frozen importlib._bootstrap>", line 1027, in _find_and_load ... RuntimeError: super(): __class__ cell not found ... --- |
|||
| msg407956 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2021年12月07日 17:35 | |
There are around 27 _PyUnicode_EqualToASCIIId() calls in the Python code base. I don't think that avoiding _PyUnicode_EqualToASCIIId() is a good solution :-) Fixing _PyUnicode_EqualToASCIIId() to make it compatible with subinterpreters sound more reasonable: remove the PyUnicode_CHECK_INTERNED() test optimization. |
|||
| msg408002 - (view) | Author: Inada Naoki (methane) * (Python committer) | Date: 2021年12月08日 09:02 | |
Should `_PyUnicode_EqualToASCIIId()` support comparing two unicode from different interpreter?? |
|||
| msg408012 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2021年12月08日 11:36 | |
> Should `_PyUnicode_EqualToASCIIId()` support comparing two unicode from different interpreter? Right now, there still many cases where objects are still shared between two interpreters: * None, True, False singletons * strings from code objects (according to what I saw when I reproduced the issue) * objects from static types: type name (str), subtypes (tuple), MRO (tuple), etc. * etc. More details in the following issues: * bpo-40533: [subinterpreters] Don't share Python objects between interpreters * bpo-40512: [subinterpreters] Meta issue: per-interpreter GIL |
|||
| msg408069 - (view) | Author: Inada Naoki (methane) * (Python committer) | Date: 2021年12月09日 02:59 | |
That's too bad. We can not compare two Unicode by pointer even if both are interned anymore... It was a nice optimization. |
|||
| msg408085 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2021年12月09日 08:57 | |
> We can not compare two Unicode by pointer even if both are interned anymore... It was a nice optimization. If two strings are interned and part of the same interpreter, the "ptr1 == ptr2" comparison continues to work. What is not compatible with subinterpreters, which have other interned string objects, is the assumption that two interned strings are not equal if both strings are interned and pointeres are not equal. _PyUnicode_EqualToASCIIId() is the only function making this assumption. * dictkeys_stringlookup(): if pointers are not equal, compare the hash: if the two hashes are equal, compare the strings content with unicode_eq() * PyUnicode_Compare(): if pointers are not equal, compare the strings content with unicode_compare() * _PyUnicode_EQ(): always compare the strings content * PyUnicode_RichCompare(): if pointers are not equal, compare the strings content None of these functions have a fast path for interned strings. Well, _PyUnicode_EqualToASCIIId() is different since right is always an interned string. Note: unicode_compare() is strange, it requires both strings to be equal, but it does not consider that two strings are not equal if their kinds are not equal. If both strings are ready and their kinds are not equal, the two strings cannot be equal... unicode_compare_eq() returns 0 if the two kinds are not equal. |
|||
| msg408125 - (view) | Author: Dong-hee Na (corona10) * (Python committer) | Date: 2021年12月09日 14:04 | |
> If two strings are interned and part of the same interpreter, the "ptr1 == ptr2" comparison continues to work. Yeah, AFAIK Comparing two interned strings from different interpreters are not the available use-case. |
|||
| msg408201 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2021年12月10日 12:32 | |
I marked bpo-46034 as a duplicate: setting a class "__init__" attribute doesn't update properly its tp_init slot. update_slot() of Objects/typeobject.c only uses a fast pointer comparison since both strings are interned, but the test fails if the two strings are part of two different Python interpreters. In bpo-46034 case, the problem is that the "slotdefs" array uses interned strings created in the main interpreter, whereas the update_slot() name parameter (Python string) was created in a sub-interpreter. Reproducer: ========================================= import _testcapi code = r"""class Greeting(): pass def new_init(inst, name): inst.text = f"Hello, {name}\n" Greeting.__init__ = new_init Greeting("Miro")""" _testcapi.run_in_subinterp(code) ========================================= |
|||
| msg408202 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2021年12月10日 12:35 | |
Attachd cmp_interned_strings.patch fix _PyUnicode_EqualToASCIIId() and update_slot() for subinterpreters. I will create a PR once we agree if it's required to support subinterpreters there, and if there is no better (faster) option. For update_slot(), one option to avoid adding a "slow" _PyUnicode_EQ() call would be to have per-interpreter interned strings for the slotdefs array. |
|||
| msg408605 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2021年12月15日 14:45 | |
I created PR 30123 to fix _PyUnicode_EqualToASCIIId() and type update_slot() functions. I added comments explaining why we can no longer optimize the comparison of two interned string objects. |
|||
| msg408610 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2021年12月15日 15:11 | |
These bug prevent the Fedora infra team from upgrading the Koji builders to Fedora 35. Koji runs on mod_wsgi which is affected by the bug: https://bugzilla.redhat.com/show_bug.cgi?id=2030621#c1 |
|||
| msg408611 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2021年12月15日 15:16 | |
See also bpo-46070: I don't know if it's related. |
|||
| msg408614 - (view) | Author: Mark Shannon (Mark.Shannon) * (Python committer) | Date: 2021年12月15日 15:44 | |
The problem here is that different sub-interpreters have different strings for the same Python string. Unless sub-interpreters are fully independent, and they cannot be due to limitations imposed by the stable API, then all sub-interpreters must share the same poll of strings. Since the only object reachable from a string is the `str` object (which is a static global object `PyUnicode_Type`), then the invariant that no object that is unique to one sub-interpreter can be reached from another sub-interpreter remains valid if strings are shared. I.e. there is no reason not to share strings. As Victor points out, there is no bug in 3.9 because interned strings are common across all interpreter. We should revert that behavior. |
|||
| msg408620 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2021年12月15日 17:15 | |
Mark: "As Victor points out, there is no bug in 3.9 because interned strings are common across all interpreter. We should revert that behavior." The rationale for having per-interpreter interned strings and per-interpreter _Py_IDENTIFIER() string objects can be found in bpo-39465 and bpo-40521. |
|||
| msg408639 - (view) | Author: Eric Snow (eric.snow) * (Python committer) | Date: 2021年12月15日 19:51 | |
It sounds like this bug is another case where we have made some objects
per-interpreter but others are still global and this is causing problems.
_PyUnicode_EqualToASCIIId() wouldn't have any problems if interpreters
weren't sharing any objects (or were only sharing immutable "immortal"
objects).
For now can we just move the relevant per-interpreter strings from
PyInterpreterState.unicode_state ("interned" and "ids") up to
_PyRuntimeState. They will then be global, which should restore
the correct behavior.
Personally, I'd rather we not revert the original change. Moving the data
to _PyRuntimeState would save me some effort with related work I'm doing
right now.
Of course, the potential bug would still exist in _PyUnicode_EqualToASCIIId().
Could we add a test as part of this fix to verify the failure case described
here actually works?
|
|||
| msg408640 - (view) | Author: Eric Snow (eric.snow) * (Python committer) | Date: 2021年12月15日 19:52 | |
FWIW, it makes sense to me for the interned strings to be per-interpreter eventually. Otherwise strings interned by an interpreter would persist after that interpreter is finalized, potentially leaking memory until the runtime is finalized. However, if we end up with immortal objects then I think all the strings created through _Py_IDENTIFIER() should be global (_PyRuntimeState). Otherwise they must be per-interpreter (if we have a per-interpreter GIL). |
|||
| msg408666 - (view) | Author: Eric Snow (eric.snow) * (Python committer) | Date: 2021年12月16日 01:11 | |
I've created a PR for moving the interned strings and identifiers to _PyRuntimeState until we are ready to move them back to the interpreter. |
|||
| msg408667 - (view) | Author: Eric Snow (eric.snow) * (Python committer) | Date: 2021年12月16日 01:13 | |
If that seems okay, I'll work on a backport PR for 3.10. |
|||
| msg409242 - (view) | Author: Eric Snow (eric.snow) * (Python committer) | Date: 2021年12月27日 18:57 | |
are there any objections to my PR? |
|||
| msg409767 - (view) | Author: Petr Viktorin (petr.viktorin) * (Python committer) | Date: 2022年01月05日 13:33 | |
> Personally, I'd rather we not revert the original change. Even in 3.10? Your PR looks pretty heavy for a bugfix release, and won't apply cleanly to 3.10. > Moving the data to _PyRuntimeState would save me some effort with related work I'm doing right now. I guess that's what makes the PR hard to review. What's the plan (or are there more conflicting plans), and what's the state in 3.10 vs. main? |
|||
| msg409787 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2022年01月05日 16:53 | |
IMO writing a complete rationale for running multiple interpreters in parallel which require a whole PEP. I didn't write such PEP yet since there are still non-trivial technical issues, especially the problem of static types: bpo-40601. I don't have time right now to measure the performance overhead of PR 30123 on _PyUnicode_EqualToASCIIId() and on changing a type attribute (like overriding the __init__() method). I expect a minor overhead on micro-benchmark and no significant change on pyperformance macrobenchmarks. But again, right I don't have time to go through such analysis. The priority for now is to unblock the Python 3.11 release and repair the Python 3.10 regression, so I'm fine with reverting my commit ea251806b8dffff11b30d2182af1e589caf88acf which introduced the regression. PR 30131 makes more changes than just reverting the commit, it changes the _PyRuntimeState structure and it also reverts the identifiers change, whereas so far, there is no known relationship between this issue and identifiers. IMO it's ok to leave identifiers unchanged. |
|||
| msg409788 - (view) | Author: Eric Snow (eric.snow) * (Python committer) | Date: 2022年01月05日 17:00 | |
+1 on just reverting in both branches. I can deal with my stuff separately. |
|||
| msg409797 - (view) | Author: Eric Snow (eric.snow) * (Python committer) | Date: 2022年01月05日 19:07 | |
> IMO writing a complete rationale for running multiple interpreters in parallel which require a whole PEP. FYI, I'm planning on having such a PEP published in the next few days. |
|||
| msg409818 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2022年01月06日 07:53 | |
New changeset 35d6540c904ef07b8602ff014e520603f84b5886 by Victor Stinner in branch 'main': bpo-46006: Revert "bpo-40521: Per-interpreter interned strings (GH-20085)" (GH-30422) https://github.com/python/cpython/commit/35d6540c904ef07b8602ff014e520603f84b5886 |
|||
| msg409855 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2022年01月06日 15:12 | |
New changeset 72c260cf0c71eb01eb13100b751e9d5007d00b70 by Victor Stinner in branch '3.10': [3.10] bpo-46006: Revert "bpo-40521: Per-interpreter interned strings (GH-20085)" (GH-30422) (GH-30425) https://github.com/python/cpython/commit/72c260cf0c71eb01eb13100b751e9d5007d00b70 |
|||
| msg409861 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2022年01月06日 15:22 | |
_PyUnicode_EqualToASCIIId() and type update_slot() functions are fixed in 3.10 and main branches. The regression is now fixed. But the revert reintroduces the issue on subinterpreters, so I created bpo-46283: "[subinterpreters] Unicode interned strings must not be shared between interpreters". |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:59:53 | admin | set | github: 90164 |
| 2022年01月08日 12:07:24 | vstinner | set | status: open -> closed resolution: fixed stage: patch review -> resolved |
| 2022年01月08日 00:12:11 | pablogsal | set | priority: release blocker -> |
| 2022年01月06日 15:22:35 | vstinner | set | messages: + msg409861 |
| 2022年01月06日 15:12:36 | vstinner | set | messages: + msg409855 |
| 2022年01月06日 14:30:07 | vstinner | set | pull_requests: + pull_request28639 |
| 2022年01月06日 07:59:48 | vstinner | set | pull_requests: + pull_request28630 |
| 2022年01月06日 07:53:52 | vstinner | set | messages: + msg409818 |
| 2022年01月05日 19:07:41 | eric.snow | set | messages: + msg409797 |
| 2022年01月05日 17:00:05 | eric.snow | set | messages: + msg409788 |
| 2022年01月05日 16:53:49 | vstinner | set | messages: + msg409787 |
| 2022年01月05日 16:27:26 | vstinner | set | pull_requests: + pull_request28627 |
| 2022年01月05日 13:33:27 | petr.viktorin | set | nosy:
+ petr.viktorin messages: + msg409767 |
| 2021年12月27日 18:57:54 | eric.snow | set | messages: + msg409242 |
| 2021年12月22日 16:58:09 | srittau | set | nosy:
+ srittau |
| 2021年12月16日 01:13:09 | eric.snow | set | messages: + msg408667 |
| 2021年12月16日 01:11:44 | eric.snow | set | messages: + msg408666 |
| 2021年12月16日 01:10:48 | eric.snow | set | pull_requests: + pull_request28350 |
| 2021年12月15日 19:52:59 | eric.snow | set | messages: + msg408640 |
| 2021年12月15日 19:51:13 | eric.snow | set | messages: + msg408639 |
| 2021年12月15日 17:15:51 | vstinner | set | messages: + msg408620 |
| 2021年12月15日 15:44:03 | Mark.Shannon | set | nosy:
+ Mark.Shannon messages: + msg408614 |
| 2021年12月15日 15:16:25 | vstinner | set | messages: + msg408611 |
| 2021年12月15日 15:11:46 | vstinner | set | messages: + msg408610 |
| 2021年12月15日 14:45:48 | vstinner | set | messages: + msg408605 |
| 2021年12月15日 14:44:52 | vstinner | set | stage: patch review pull_requests: + pull_request28342 |
| 2021年12月14日 23:39:00 | craigh | set | nosy:
+ craigh |
| 2021年12月11日 13:08:09 | diabonas | set | nosy:
+ diabonas |
| 2021年12月10日 15:17:20 | hroncok | set | nosy:
+ hroncok |
| 2021年12月10日 12:44:06 | vstinner | set | keywords:
+ 3.10regression priority: normal -> release blocker nosy: + pablogsal |
| 2021年12月10日 12:35:11 | vstinner | set | files:
+ cmp_interned_strings.patch keywords: + patch messages: + msg408202 |
| 2021年12月10日 12:32:58 | vstinner | set | messages: + msg408201 |
| 2021年12月09日 14:04:59 | corona10 | set | messages: + msg408125 |
| 2021年12月09日 08:57:57 | vstinner | set | messages: + msg408085 |
| 2021年12月09日 02:59:23 | methane | set | messages: + msg408069 |
| 2021年12月08日 21:25:06 | ndjensen | set | nosy:
+ ndjensen |
| 2021年12月08日 11:36:17 | vstinner | set | nosy:
+ eric.snow messages: + msg408012 |
| 2021年12月08日 09:02:42 | methane | set | nosy:
+ methane messages: + msg408002 |
| 2021年12月07日 17:35:35 | vstinner | set | messages: + msg407956 |
| 2021年12月07日 17:27:51 | vstinner | set | messages: + msg407953 |
| 2021年12月07日 17:21:43 | vstinner | set | nosy:
+ serhiy.storchaka messages: + msg407952 |
| 2021年12月07日 17:19:22 | vstinner | set | messages: + msg407951 |
| 2021年12月07日 17:12:26 | vstinner | create | |