homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: distutils.spawn should use subprocess (hang in parallel builds on QNX)
Type: Stage: patch review
Components: Library (Lib) Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Elad Lahav, Michael.Felt, dstufft, eric.araujo, jaraco, pitrou, vstinner
Priority: normal Keywords: patch

Created on 2020年02月26日 19:18 by Elad Lahav, last changed 2022年04月11日 14:59 by admin.

Files
File name Uploaded Description Edit
fork_mt.py Elad Lahav, 2020年02月26日 19:18
Pull Requests
URL Status Linked Edit
PR 18743 merged vstinner, 2020年03月02日 13:51
PR 18778 merged vstinner, 2020年03月04日 14:12
PR 18847 closed Michael.Felt, 2020年03月08日 10:39
PR 18872 merged vstinner, 2020年03月09日 10:15
Messages (36)
msg362717 - (view) Author: Elad Lahav (Elad Lahav) Date: 2020年02月26日 19:18
The attached code causes the child processes to hang on QNX. The hang is caused by the logging module trying to acquire the module lock while in an atfork() handler. In a system where semaphore state is kept in user mode and is thus inherited from the parent on fork() the semaphore may appear to have a value of 0, and thus will never be posted to in the single-threaded child.
I don't know how it works on other systems - may be pure chance.
msg362728 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2020年02月26日 20:02
Note the original problem stems when you using "setup.py -j <NJOBS>".
The attached reproducer is a bit of a red herring, because indeed doing fork() + exec() in pure Python is too fragile. That is why the standard subprocess does it in C. The problem here is that the `distutils.spawn` module still has its hand-baked fork() + exec() pure Python code from the 1990s.
So the solution to the OP's actual problem should be to recode `distutils.spawn` using the subprocess module.
Original mailing-list discussion at:
https://mail.python.org/archives/list/python-dev@python.org/thread/PTJ3UAZNYBYBYQ4WLMSERZR2XIVWISLM/ 
msg362731 - (view) Author: Elad Lahav (Elad Lahav) Date: 2020年02月26日 20:10
I'm not convinced that a multi-threaded fork()+exec() from C would be any better, unless the Python code goes to great lengths to avoid any non-async-signal-safe operations between the fork() and the exec().
So along with the proposed change to switch to the subprocess() module I would like to propose that that module switch to using posix_spawn() instead of fork()+exec().
msg362732 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020年02月26日 20:13
@Antoine: I can sort of see why you consider the small example a red herring, since it mixes threads and fork. Nevertheless, a simpler version (not using an executor) would probably work just fine? It's the edge cases where the mixture becomes explosive. So I still think there's something that could be improved in ThreadPoolExecutor.
@Elad: Your script contains a bug (there is no definition of 'exec_fn'). But anything that returns or raises from the child process does indeed hang.
Since I can repro this on macOS I do QNX is a red herring here.
msg362743 - (view) Author: Elad Lahav (Elad Lahav) Date: 2020年02月26日 21:35
"Your script contains a bug (there is no definition of 'exec_fn')"
Told you I wasn't much of a Python developer ;-) 
This was just copy-pasted from spawn.py and I missed the fact that exec_fn() is not a library function.
Your last comment wasn't clear - do you see this problem on macOS? Is it only after you fixed the example? For me it hangs with the bug, as the problem occurs before the call to exec_fn().
msg362752 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020年02月26日 23:02
When I run this on macOS, either with the bug, or with the buggy line commented out, I get the same hang. When I interrupt it, I get
 a separate traceback from each thread. Here's the full session:
Parent 78918
Parent 78919
Child 78918
Parent 78920
Child 78919
Parent 78921
Child 78920
Child 78921
 C-c C-cTraceback (most recent call last):
 File "tt.py", line 18, in <module>
 fut.result()
 File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 434, in result
 self._condition.wait(timeout)
 File "/usr/local/lib/python3.8/threading.py", line 302, in wait
 waiter.acquire()
KeyboardInterrupt
Exception in worker
Traceback (most recent call last):
 File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 78, in _worker
 work_item = work_queue.get(block=True)
KeyboardInterrupt
Exception in worker
Traceback (most recent call last):
 File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 78, in _worker
 work_item = work_queue.get(block=True)
KeyboardInterrupt
Exception in worker
Traceback (most recent call last):
 File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 78, in _worker
 work_item = work_queue.get(block=True)
KeyboardInterrupt
Exception in worker
Traceback (most recent call last):
 File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 78, in _worker
 work_item = work_queue.get(block=True)
KeyboardInterrupt
Done 78918 78917
Done 78921 78917
Done 78919 78917
Done 78920 78917
(I added some prints to the script. Here's the script I ran:)
import os
from concurrent.futures import ThreadPoolExecutor
def new_process(arg):
 pid = os.fork()
 if pid == 0:
 print("Child", os.getpid())
 ## os.execl("/usr/bin/true", "/usr/bin/true")
 else:
 print("Parent", pid)
 pid, status = os.waitpid(pid, 0)
 print("Done", pid, os.getpid())
executor = ThreadPoolExecutor(max_workers=4)
futures = [executor.submit(new_process, None)
 for i in range(0, 4)]
for fut in futures:
 fut.result()
When I uncomment the os.execl() line, the program runs and completes.
msg362794 - (view) Author: Elad Lahav (Elad Lahav) Date: 2020年02月27日 13:14
"When I uncomment the os.execl() line, the program runs and completes."
In that case I'm not sure it is the same issue. The child processes in your case executed their part of the new_process function, which then returned. Nevertheless from the trace it looks like none of the children exited. So you are not hitting the logging lock issue, but may be hitting a similar problem in another module.
Is the logging module always enabled, or does it depend on the environment?
msg362818 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020年02月27日 17:15
The key is that os.execl() never returns.
My understanding is that the hang happens when the child code returns (or raises) in the forked child process, because the semaphore state is cloned by the fork.
There is no logging here. I just added some print() calls. (See the script I posted.)
msg362822 - (view) Author: Elad Lahav (Elad Lahav) Date: 2020年02月27日 17:28
OK, but that's not the problem I see. The parent calls fork(), creates a child that then runs the atfork() handlers *before* returning from the os.fork() call (which is the expected behaviour). At least one of those atfork() handlers is the one registered by the logging module (Lib/logging/__init__.py) and that one attempts to acquire a lock. So in my case the child never gets to the point of calling exec(), which is why I didn't notice the bug in the script.
msg362824 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020年02月27日 17:35
And you're sure that your "toy" example hangs in the at-fork handler.
In that case I can't help you further, I don't have the resources or time to try out QNX. Your best bet is indeed to change distutils.spawn to not use fork+exec -- do you think you can come up with a PR for that?
msg362826 - (view) Author: Elad Lahav (Elad Lahav) Date: 2020年02月27日 17:46
As sure as I can be given my limited experience debugging Python... Luckily I do know my way around the QNX kernel ;-)
The stack trace for the child shows it stuck on a semaphore with a count value of 0. A print in the logging module shows that the child gets to the _acquireLock() call on line 245 of logging/__init__.py but a print after that line is never displayed. I am basing my analysis on the combination of the two observations.
I'll try to come up with a change to distutils.spawn, as well as a change to subprocess to use posix_spawn().
Stack trace:
#0 SyncSemWait () at /builds/workspace/710-SDP/build_x86_64/lib/c/kercalls/x86_64/SyncSemWait.S:37
#1 0x000000267938bac6 in PyThread_acquire_lock_timed (lock=lock@entry=0x3bcde5a430, microseconds=microseconds@entry=-1000000, 
 intr_flag=intr_flag@entry=1) at Python/thread_pthread.h:459
#2 0x00000026793de7bd in acquire_timed (lock=0x3bcde5a430, timeout=-1000000000) at ./Modules/_threadmodule.c:63
#3 0x00000026793de8e7 in rlock_acquire (self=0x3bce057c90, args=<optimized out>, kwds=<optimized out>) at ./Modules/_threadmodule.c:308
#4 0x000000267941b631 in method_vectorcall_VARARGS_KEYWORDS (func=0x3bcdee2ef0, args=0x3bce1ad4d0, nargsf=<optimized out>, kwnames=<optimized out>)
 at Objects/descrobject.c:332
#5 0x0000002679285ff6 in _PyObject_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=<optimized out>)
 at ./Include/cpython/abstract.h:123
#6 call_function (kwnames=0x0, oparg=1, pp_stack=<synthetic pointer>, tstate=0x3bcdea9040) at Python/ceval.c:4987
#7 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3486
#8 0x000000267928395b in function_code_fastcall (co=<optimized out>, args=<optimized out>, nargs=0, globals=<optimized out>) at Objects/call.c:283
#9 0x0000002679285d29 in _PyObject_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=<optimized out>)
 at ./Include/cpython/abstract.h:127
#10 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x3bcdea9040) at Python/ceval.c:4987
#11 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3500
#12 0x000000267928395b in function_code_fastcall (co=<optimized out>, args=<optimized out>, nargs=1, globals=<optimized out>) at Objects/call.c:283
#13 0x0000002679285d29 in _PyObject_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=<optimized out>)
 at ./Include/cpython/abstract.h:127
#14 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x3bcdea9040) at Python/ceval.c:4987
#15 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3500
#16 0x000000267928395b in function_code_fastcall (co=<optimized out>, args=<optimized out>, nargs=1, globals=<optimized out>) at Objects/call.c:283
#17 0x0000002679285ff6 in _PyObject_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=<optimized out>)
 at ./Include/cpython/abstract.h:123
#18 call_function (kwnames=0x0, oparg=1, pp_stack=<synthetic pointer>, tstate=0x3bcdea9040) at Python/ceval.c:4987
#19 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3486
#20 0x000000267928395b in function_code_fastcall (co=<optimized out>, args=<optimized out>, nargs=0, globals=<optimized out>) at Objects/call.c:283
#21 0x000000267929d26a in _PyObject_FastCallDict (callable=0x3bce1bd160, args=<optimized out>, nargsf=<optimized out>, kwargs=<optimized out>)
 at Objects/call.c:96
#22 0x00000026793a0584 in run_at_forkers (lst=<optimized out>, reverse=<optimized out>) at ./Modules/posixmodule.c:435
#23 0x00000026793abbfb in run_at_forkers (reverse=0, lst=<optimized out>) at ./Modules/posixmodule.c:420
#24 PyOS_AfterFork_Child () at ./Modules/posixmodule.c:474
#25 0x00000026793abd08 in os_fork_impl (module=<optimized out>) at ./Modules/posixmodule.c:6082
msg362847 - (view) Author: Elad Lahav (Elad Lahav) Date: 2020年02月27日 22:10
I may be missing something, but subprocess depends on _posixsubprocess, which doesn't get built until setup.py is run, which results in a circular dependency.
msg363017 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020年02月29日 21:31
> Note the original problem stems when you using "setup.py -j <NJOBS>".
setup.py doesn't use multiprocessing. multiprocessing is super complex. Would it be possible to write a reproducer which doesn't use multiprocessing?
> Stack trace: (...)
Can you try to get the Python traceback? Try py-bt command in gdb, or attempt to use faulthandler.dump_traceback_later().
msg363038 - (view) Author: Elad Lahav (Elad Lahav) Date: 2020年03月01日 02:51
"setup.py doesn't use multiprocessing. multiprocessing is super complex. Would it be possible to write a reproducer which doesn't use multiprocessing?"
But the problem is with the handling of fork() by Python modules, and specifically with multi-threaded fork()s. Without multiple processes there is no issue.
And setup.py does use use multiple processes via a thread pool, where each thread calls fork(), if it detects that "make" was invoked with the parallel (-j) option.
msg363058 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2020年03月01日 13:57
@Elad:
> I'm not convinced that a multi-threaded fork()+exec() from C would be any better, unless the Python code goes to great lengths to avoid any non-async-signal-safe operations between the fork() and the exec().
That's exactly what the C code in the C subprocess module for POSIX does, though:
https://github.com/python/cpython/blob/master/Modules/_posixsubprocess.c#L828-L854
That's why I hope that using subprocess instead of a fork()+exec() sequence naively coded in pure Python would solve the issue.
@Guido:
> I can sort of see why you consider the small example a red herring, since it mixes threads and fork
Perhaps "red herring" was the wrong expression. What I mean is that it's no surprise that fork()+exec() sequence written in pure Python would be unsafe in multi-thread settings. The solution, though, is not to avoid threads or try to workaround the issue in ThreadPoolExecutor, but rather to avoid doing fork()+exec() in pure Python.
msg363165 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020年03月02日 12:22
> And setup.py does use use multiple processes via a thread pool, where each thread calls fork(), if it detects that "make" was invoked with the parallel (-j) option.
Oh ok, now I get it :-) setup.py uses distutils to build extensions and distutils.command.build_ext uses concurrent.futures for parallelism:
 if self.parallel:
 self._build_extensions_parallel()
 else:
 self._build_extensions_serial()
with:
 def _build_extensions_parallel(self):
 workers = self.parallel
 if self.parallel is True:
 workers = os.cpu_count() # may return None
 try:
 from concurrent.futures import ThreadPoolExecutor
 except ImportError:
 workers = None
 if workers is None:
 self._build_extensions_serial()
 return
 with ThreadPoolExecutor(max_workers=workers) as executor:
 futures = [executor.submit(self.build_extension, ext)
 for ext in self.extensions]
 for ext, fut in zip(self.extensions, futures):
 with self._filter_build_errors(ext):
 fut.result()
The problem is not concurrent.futures but the job submitted to the executor: build_extension() uses distutils.spawn() which is unsafe.
distutils.spawn() is the root issue if I understood correctly. It must be rewritten with subprocess.
msg363172 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020年03月02日 13:56
I wrote PR 18743 to reimplement distutils.spawn.spawn() function with the subprocess module.
It also changes setup.py to use a basic implementation of the subprocess module if the subprocess module is not available: before required C extension modules are built.
msg363173 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020年03月02日 13:59
Attached fork_mt.py example uses exec_fn() function... which is not defined. Is it on purpose?
msg363176 - (view) Author: Elad Lahav (Elad Lahav) Date: 2020年03月02日 14:11
"Attached fork_mt.py example uses exec_fn() function... which is not defined. Is it on purpose?"
That was my mistake - a copy/paste of existing code from distutils. Since the example hangs before the script gets to exec_fn() I didn't notice the problem. Just change it to os.execl
msg363349 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020年03月04日 13:50
New changeset 1ec63b62035e73111e204a0e03b83503e1c58f2e by Victor Stinner in branch 'master':
bpo-39763: distutils.spawn now uses subprocess (GH-18743)
https://github.com/python/cpython/commit/1ec63b62035e73111e204a0e03b83503e1c58f2e
msg363363 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020年03月04日 17:44
New changeset 00c77ae55a82548a6b45af73cdf712ea34910645 by Victor Stinner in branch 'master':
bpo-39763: Refactor setup.py (GH-18778)
https://github.com/python/cpython/commit/00c77ae55a82548a6b45af73cdf712ea34910645
msg363646 - (view) Author: Michael Felt (Michael.Felt) * Date: 2020年03月08日 10:27
The PR for 3.9 breaks AIX build (since https://buildbot.python.org/all/#/builders/119/builds/384) reporting the following:
Traceback (most recent call last):
 File "/data/prj/python/python3-3.9/../git/python3-3.9/setup.py", line 69, in <module>
 from distutils.command.build_ext import build_ext
 File "/data/prj/python/git/python3-3.9/Lib/distutils/command/build_ext.py", line 33, in <module>
 class build_ext(Command):
 File "/data/prj/python/git/python3-3.9/Lib/distutils/command/build_ext.py", line 63, in build_ext
 "(default: %s)" % get_platform()),
 File "/data/prj/python/git/python3-3.9/Lib/distutils/util.py", line 107, in get_platform
 return get_host_platform()
 File "/data/prj/python/git/python3-3.9/Lib/distutils/util.py", line 83, in get_host_platform
 return aix_platform()
 File "/data/prj/python/git/python3-3.9/Lib/_aix_support.py", line 83, in aix_platform
 vrmf, bd = _aix_bosmp64()
 File "/data/prj/python/git/python3-3.9/Lib/_aix_support.py", line 53, in _aix_bosmp64
 out = subprocess.check_output(["/usr/bin/lslpp", "-Lqc", "bos.mp64"])
AttributeError: module 'subprocess' has no attribute 'check_output'
Will post a PR for this.
The cause:
Briefly, due to the addition of an importable subprocess must now test specifically for check_output during the early build process.
msg363711 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020年03月09日 10:16
> AttributeError: module 'subprocess' has no attribute 'check_output'
I wrote PR 18872 to implement it for setup.py.
Michael Felt: Can you please test it on AIX?
msg363749 - (view) Author: Michael Felt (Michael.Felt) * Date: 2020年03月09日 16:54
Almost. The command is run, but not enough of the bootstrap is finished - it seems.
 File "/data/prj/python/git/python3-3.9/Lib/_aix_support.py", line 54, in _aix_bosmp64
 out = out.decode("utf-8").strip().split(":") # type: ignore
AttributeError: 'str' object has no attribute 'decode'
Just skipping the subprocess bit (which is what the original does, waiting for better moments) is sufficient for the bootstrap phase.
msg363754 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020年03月09日 17:10
> AttributeError: 'str' object has no attribute 'decode'
Oops, it should now be fixed by my second commit. Please retry PR 18872.
msg363780 - (view) Author: Michael Felt (Michael.Felt) * Date: 2020年03月09日 22:44
Comes further. The build finishes, but socket and asyncio are missing...
At least a build will proceed, and I can look into this new, perhaps unrelated issue re: socket later.
*** WARNING: renaming "_asyncio" since importing it failed: No module named '_socket'
msg363782 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020年03月09日 22:46
New changeset addaaaa946855ad59c8f5c698aa0891d7e44f018 by Victor Stinner in branch 'master':
bpo-39763: Add _bootsubprocess to build Python on AIX (GH-18872)
https://github.com/python/cpython/commit/addaaaa946855ad59c8f5c698aa0891d7e44f018
msg363784 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020年03月09日 22:47
> At least a build will proceed, and I can look into this new, perhaps unrelated issue re: socket later.
> *** WARNING: renaming "_asyncio" since importing it failed: No module named '_socket'
Do you see any error when the _socket module is built? Please open a new issue with full logs.
msg363840 - (view) Author: Michael Felt (Michael.Felt) * Date: 2020年03月10日 19:01
re: _socket - I'll start researching now, but it may still be related to this.
a) if it looks like related to this issue I'll add a some report here
b) in any case, as requested, I'll start a new issue - and leave it to you to decide to close that and continue here, or with the new issue.
msg363841 - (view) Author: Michael Felt (Michael.Felt) * Date: 2020年03月10日 20:01
OK. BEFORE I open a new issue, I am going to guess that something is wrong with finding _socket.so - because it exists, but "something" is not finding it.
Following modules built successfully but were removed because they could not be imported:^M
_asyncio ^M
^M
running build_scripts^M
copying and adjusting /data/prj/python/git/python3-3.9/Tools/scripts/pydoc3 -> build/scripts-3.9^M
copying and adjusting /data/prj/python/git/python3-3.9/Tools/scripts/idle3 -> build/scripts-3.9^M
copying and adjusting /data/prj/python/git/python3-3.9/Tools/scripts/2to3 -> build/scripts-3.9^M
changing mode of build/scripts-3.9/pydoc3 from 644 to 755^M
changing mode of build/scripts-3.9/idle3 from 644 to 755^M
changing mode of build/scripts-3.9/2to3 from 644 to 755^M
renaming build/scripts-3.9/pydoc3 to build/scripts-3.9/pydoc3.9^M
renaming build/scripts-3.9/idle3 to build/scripts-3.9/idle3.9^M
renaming build/scripts-3.9/2to3 to build/scripts-3.9/2to3-3.9^M
xlc_r -c -DNDEBUG -O -I/opt/include -O2 -qmaxmem=-1 -qarch=pwr5 -I../git/python3-3.9/Include/internal -IObjects -IInclude -IPython -I. -I../git/python3-3.9/Include -I/opt/include -DPy_BUILD_CORE -o Programs/_testembed.o ../git
/python3-3.9/Programs/_testembed.c^M
../git/python3-3.9/Modules/makexp_aix Modules/python.exp . libpython3.9.a; xlc_r /opt/lib/libintl.a -L/opt/lib -liconv -lexpat -Wl,-bE:Modules/python.exp -lld -o Programs/_testembed Programs/_testembed.o libpython3.9.so -lintl -l
dl -lm -lm ^M
NOTE: The module was built successfully, but not found.
root@x065:[/data/prj/python/python3-3.9]find . -name _socket.so
./build/lib.aix-5307-0747-32-3.9/_socket.so
msg363853 - (view) Author: Michael Felt (Michael.Felt) * Date: 2020年03月10日 21:38
I am confused.
The bot complains nearly immediately about missing _socket
Following modules built successfully but were removed because they could not be imported:
_asyncio 
running build_scripts
copying and adjusting /home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Tools/scripts/pydoc3 -> build/scripts-3.9
copying and adjusting /home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Tools/scripts/idle3 -> build/scripts-3.9
copying and adjusting /home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Tools/scripts/2to3 -> build/scripts-3.9
changing mode of build/scripts-3.9/pydoc3 from 600 to 755
changing mode of build/scripts-3.9/idle3 from 600 to 755
changing mode of build/scripts-3.9/2to3 from 600 to 755
renaming build/scripts-3.9/pydoc3 to build/scripts-3.9/pydoc3.9
renaming build/scripts-3.9/idle3 to build/scripts-3.9/idle3.9
renaming build/scripts-3.9/2to3 to build/scripts-3.9/2to3-3.9
./python -E -c 'import sys ; from sysconfig import get_platform ; print("%s-%d.%d" % (get_platform(), *sys.version_info[:2]))' >platform
./python ./Tools/scripts/run_tests.py -j 1 -u all -W --slowest --fail-env-changed --timeout=900 -j2 --junit-xml test-results.xml 
Traceback (most recent call last):
 File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/./Tools/scripts/run_tests.py", line 12, in <module>
 import test.support
 File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/test/support/__init__.py", line 6, in <module>
 import asyncio.events
 File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/asyncio/__init__.py", line 8, in <module>
 from .base_events import *
 File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/asyncio/base_events.py", line 23, in <module>
 import socket
 File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/socket.py", line 51, in <module>
 import _socket
ModuleNotFoundError: No module named '_socket'
while if I use my (now closed) pr18847 I do not get the results immediately - it starts the whole test series, and then fails - but not because _socket does not exist, instead:
0:00:48 [ 53/420/2] test_socket failed (30.4 sec) -- running: test_multiprocessing_fork (48.3 sec)
/data/prj/python/git/python3-3.9/Lib/test/test_socket.py:2557: RuntimeWarning: received malformed or improperly-truncated ancillary data
 result = sock.recvmsg(bufsize, *args)
/data/prj/python/git/python3-3.9/Lib/test/test_socket.py:2648: RuntimeWarning: received malformed or improperly-truncated ancillary data
 result = sock.recvmsg_into([buf], *args)
test test_socket failed -- Traceback (most recent call last):
 File "/data/prj/python/git/python3-3.9/Lib/test/test_socket.py", line 1481, in testGetaddrinfo
 self.assertEqual(socktype, socket.SOCK_STREAM)
AssertionError: 0 != <SocketKind.SOCK_STREAM: 1>
In short, stumped...
msg363855 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020年03月10日 21:44
> OK. BEFORE I open a new issue, I am going to guess that something is wrong with finding _socket.so - because it exists, but "something" is not finding it.
Did the build worked on AIX before the commit 1ec63b62035e73111e204a0e03b83503e1c58f2e which changed distutils to use subprocess? Try commit dffe4c07095e0c693e094d3c140e85a68bd8128e (parent of the other change).
msg363943 - (view) Author: Michael Felt (Michael.Felt) * Date: 2020年03月11日 17:02
This was already confirmed by both bots...
The last bot build that worked (and for both AIX bots) is: dffe4c07095e0c693e094d3c140e85a68bd8128e
The first build that failed (for both) is: 1ec63b62035e73111e204a0e03b83503e1c58f2e
See (pass) https://buildbot.python.org/all/#/builders/119/builds/383 and https://buildbot.python.org/all/#/builders/227/builds/369; fail https://buildbot.python.org/all/#/builders/119/builds/384
 and https://buildbot.python.org/all/#/builders/227/builds/370
I'll checkout dffe4c07095e0c693e094d3c140e85a68bd8128e to verify it still works successfully.
msg363945 - (view) Author: Michael Felt (Michael.Felt) * Date: 2020年03月11日 17:32
After a checkout the build finishes (successfully) with:
Python build finished successfully!
The necessary bits to build these optional modules were not found:
_gdbm _lzma _sqlite3
_tkinter _uuid readline
To find the necessary bits, look in setup.py in detect_modules() for the module's name.
The following modules found by detect_modules() in setup.py, have been
built by the Makefile instead, as configured by the Setup files:
_abc atexit pwd
time
running build_scripts
creating build/scripts-3.9
copying and adjusting /data/prj/python/git/python3-3.9/Tools/scripts/pydoc3 -> build/scripts-3.9
copying and adjusting /data/prj/python/git/python3-3.9/Tools/scripts/idle3 -> build/scripts-3.9
copying and adjusting /data/prj/python/git/python3-3.9/Tools/scripts/2to3 -> build/scripts-3.9
changing mode of build/scripts-3.9/pydoc3 from 644 to 755
changing mode of build/scripts-3.9/idle3 from 644 to 755
changing mode of build/scripts-3.9/2to3 from 644 to 755
renaming build/scripts-3.9/pydoc3 to build/scripts-3.9/pydoc3.9
renaming build/scripts-3.9/idle3 to build/scripts-3.9/idle3.9
renaming build/scripts-3.9/2to3 to build/scripts-3.9/2to3-3.9
/opt/bin/install -c -m 644 ../git/python3-3.9/Tools/gdb/libpython.py python-gdb.py
xlc_r -c -DNDEBUG -O -I/opt/include -O2 -qmaxmem=-1 -qarch=pwr5 -I/opt/include -O2 -qmaxmem=-1 -qarch=pwr5 -I../git/python3-3.9/Include/internal -IObjects -IInclude -IPython -I. -I../git/python3-3.9/Include -I/opt/include -I/op
t/include -DPy_BUILD_CORE -o Programs/_testembed.o ../git/python3-3.9/Programs/_testembed.c
../git/python3-3.9/Modules/makexp_aix Modules/python.exp . libpython3.9.a; xlc_r /opt/lib/libintl.a -L/opt/lib -liconv -lexpat /opt/lib/libintl.a -L/opt/lib -liconv -lexpat -Wl,-bE:Modules/python.exp -lld -o Programs/_testembed
Programs/_testembed.o libpython3.9.a -lintl -ldl -lm -lm
sed -e "s,@EXENAME@,/opt/bin/python3.9," < ../git/python3-3.9/Misc/python-config.in >python-config.py
LC_ALL=C sed -e 's,\$(\([A-Za-z0-9_]*\)),\$\{1円\},g' < Misc/python-config.sh >python-config
root@x065:[/data/prj/python/python3-3.9]./python
Python 3.9.0a4+ (tags/v3.9.0a4-59-gdffe4c0709-dirty:dffe4c0709, Mar 11 2020, 17:09:06) [C] on aix
Type "help", "copyright", "credits" or "license" for more information.
>>>
msg363947 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020年03月11日 17:42
I created bpo-39936 "Python fails to build _asyncio on module on AIX" to debug the AIX build failure.
msg372974 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2020年07月04日 01:26
I've flagged issue41207 as a regression stemming from this effort.
History
Date User Action Args
2022年04月11日 14:59:27adminsetgithub: 83944
2020年07月04日 01:26:37jaracosetnosy: + jaraco
messages: + msg372974
2020年03月11日 17:42:34vstinnersetmessages: + msg363947
2020年03月11日 17:32:38Michael.Feltsetmessages: + msg363945
2020年03月11日 17:02:34Michael.Feltsetmessages: + msg363943
2020年03月10日 22:03:17gvanrossumsetnosy: - gvanrossum
2020年03月10日 21:44:19vstinnersetmessages: + msg363855
2020年03月10日 21:38:45Michael.Feltsetmessages: + msg363853
2020年03月10日 20:01:46Michael.Feltsetmessages: + msg363841
2020年03月10日 19:01:23Michael.Feltsetmessages: + msg363840
2020年03月09日 22:47:32vstinnersetmessages: + msg363784
2020年03月09日 22:46:07vstinnersetmessages: + msg363782
2020年03月09日 22:44:45Michael.Feltsetmessages: + msg363780
2020年03月09日 17:10:43vstinnersetmessages: + msg363754
2020年03月09日 16:54:04Michael.Feltsetmessages: + msg363749
2020年03月09日 10:16:36vstinnersetmessages: + msg363711
2020年03月09日 10:15:51vstinnersetpull_requests: + pull_request18230
2020年03月08日 10:39:40Michael.Feltsetpull_requests: + pull_request18204
2020年03月08日 10:27:58Michael.Feltsetnosy: + Michael.Felt
messages: + msg363646
2020年03月04日 17:44:56vstinnersetmessages: + msg363363
2020年03月04日 14:12:43vstinnersetpull_requests: + pull_request18134
2020年03月04日 13:50:23vstinnersetmessages: + msg363349
2020年03月02日 14:11:18Elad Lahavsetmessages: + msg363176
2020年03月02日 13:59:26vstinnersetmessages: + msg363173
2020年03月02日 13:56:59vstinnersetmessages: + msg363172
2020年03月02日 13:51:31vstinnersetkeywords: + patch
stage: patch review
pull_requests: + pull_request18098
2020年03月02日 12:22:43vstinnersetmessages: + msg363165
2020年03月01日 13:57:02pitrousetmessages: + msg363058
2020年03月01日 02:53:29p-gansslesetnosy: - p-ganssle
2020年03月01日 02:51:56Elad Lahavsetmessages: + msg363038
2020年02月29日 21:31:19vstinnersetmessages: + msg363017
2020年02月27日 22:10:37Elad Lahavsetmessages: + msg362847
2020年02月27日 17:46:53Elad Lahavsetmessages: + msg362826
2020年02月27日 17:35:19gvanrossumsetmessages: + msg362824
2020年02月27日 17:28:35Elad Lahavsetmessages: + msg362822
2020年02月27日 17:15:24gvanrossumsetmessages: + msg362818
2020年02月27日 13:14:46Elad Lahavsetmessages: + msg362794
2020年02月26日 23:02:14gvanrossumsetmessages: + msg362752
2020年02月26日 21:35:45Elad Lahavsetmessages: + msg362743
2020年02月26日 20:13:46gvanrossumsetmessages: + msg362732
2020年02月26日 20:10:06Elad Lahavsettype: behavior ->
messages: + msg362731
versions: - Python 3.9
2020年02月26日 20:06:38pitrousetnosy: + vstinner, eric.araujo, dstufft, p-ganssle
2020年02月26日 20:03:27pitrousettitle: Hang after fork due to logging trying to reacquire the module lock in an atfork() handler -> distutils.spawn should use subprocess (hang in parallel builds on QNX)
2020年02月26日 20:02:54pitrousettype: behavior
messages: + msg362728
versions: + Python 3.9
2020年02月26日 19:59:43gvanrossumsetnosy: + gvanrossum, pitrou
2020年02月26日 19:18:35Elad Lahavcreate

AltStyle によって変換されたページ (->オリジナル) /