This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2021年06月09日 08:06 by vstinner, last changed 2022年04月11日 14:59 by admin. This issue is now closed.
| Messages (13) | |||
|---|---|---|---|
| msg395390 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2021年06月09日 08:06 | |
test_compile and test_multiprocessing_forkserver crashed with segfault (SIGSEGV) on AMD64 Ubuntu 3.x: https://buildbot.python.org/all/#/builders/708/builds/31 It *seems* like test_compile.test_stack_overflow() crashed, but the log is not reliable so I cannot confirm. According to buildbot, the responsible change is: "bpo-43693: Un-revert commit f3fa63e. (#26609)(10 hours ago)" https://github.com/python/cpython/commit/3e1c7167d86a2a928cdcb659094aa10bb5550c4c So Eric, can you please investigate the change? If nobody is available to fix the buildbot, I suggest to revert the change. Python was built in debug mode with: ./configure --prefix '$(PWD)/target' --with-pydebug make all test.pythoninfo: CC.version: gcc (Ubuntu 10.3.0-1ubuntu1) 10.3.0 os.uname: posix.uname_result(sysname='Linux', nodename='doxy.learntosolveit.com', release='5.11.0-18-generic', version='#19-Ubuntu SMP Fri May 7 14:22:03 UTC 2021', machine='x86_64') platform.platform: Linux-5.11.0-18-generic-x86_64-with-glibc2.33 sys.thread_info: sys.thread_info(name='pthread', lock='semaphore', version='NPTL 2.33') Logs: ./python ./Tools/scripts/run_tests.py -j 1 -u all -W --slowest --fail-env-changed --timeout=900 -j2 --junit-xml test-results.xml == CPython 3.11.0a0 (heads/main:3e1c7167d8, Jun 8 2021, 22:09:42) [GCC 10.3.0] == Linux-5.11.0-18-generic-x86_64-with-glibc2.33 little-endian == cwd: /home/buildbot/buildarea/3.x.skumaran-ubuntu-x86_64/build/build/test_python_1439770æ == CPU count: 1 == encodings: locale=UTF-8, FS=utf-8 Using random seed 5059550 0:00:00 load avg: 0.97 Run tests in parallel using 2 child processes (timeout: 15 min, worker timeout: 20 min) (...) 0:00:43 load avg: 2.22 running: test_compile (34.7 sec), test_signal (30.8 sec) 0:01:12 load avg: 3.84 [ 13/427/1] test_compile crashed (Exit code -9) -- running: test_signal (59.6 sec) (...) 0:06:26 load avg: 1.84 running: test_concurrent_futures (42.0 sec), test_multiprocessing_forkserver (30.0 sec) 0:06:56 load avg: 3.91 running: test_concurrent_futures (1 min 12 sec), test_multiprocessing_forkserver (1 min) 0:07:26 load avg: 5.47 running: test_concurrent_futures (1 min 42 sec), test_multiprocessing_forkserver (1 min 30 sec) 0:07:58 load avg: 5.93 running: test_concurrent_futures (2 min 13 sec), test_multiprocessing_forkserver (2 min 2 sec) 0:08:30 load avg: 5.73 running: test_concurrent_futures (2 min 44 sec), test_multiprocessing_forkserver (2 min 33 sec) 0:08:48 load avg: 4.62 [ 85/427/2] test_multiprocessing_forkserver crashed (Exit code -9) -- running: test_concurrent_futures (3 min 3 sec) (...) 2 tests failed: test_compile test_multiprocessing_forkserver (...) 0:27:56 load avg: 1.28 Re-running test_compile in verbose mode test_and (test.test_compile.TestExpressionStackSize) ... ok (...) test_sequence_unpacking_error (test.test_compile.TestSpecifics) ... ok test_single_statement (test.test_compile.TestSpecifics) ... ok test_stack_overflow (test.test_compile.TestSpecifics) ... make: *** [Makefile:1256: buildbottest] Killed program finished with exit code 2 elapsedTime=1684.973552 |
|||
| msg395391 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2021年06月09日 08:06 | |
See also bpo-44348 "test_exceptions.ExceptionTests.test_recursion_in_except_handler stack overflow on Windows debug builds". |
|||
| msg395408 - (view) | Author: Pablo Galindo Salgado (pablogsal) * (Python committer) | Date: 2021年06月09日 11:08 | |
I don't think that's a segfault. That seems that the process was killed no? Also, the buildbot is green so this is not happening in the latest builds |
|||
| msg395419 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2021年06月09日 13:15 | |
> I don't think that's a segfault. That seems that the process was killed no? Also, the buildbot is green so this is not happening in the latest builds * (1) 0:01:12, test_compile child process was killed by signal -9 * (2) 0:08:48, test_multiprocessing_forkserver child process was killed by signal -9 * (3) 0:27:56, test_compile main process was killed (unknown signal... I bet on signal -9, SIGSEGV) Maybe it was a manual action, but it sounds like a strange coincidence that 3 processes were killed in the same build, and it wasn't at the same time. |
|||
| msg395422 - (view) | Author: Pablo Galindo Salgado (pablogsal) * (Python committer) | Date: 2021年06月09日 14:23 | |
But SIGSEGV is signal 11, not -9 |
|||
| msg395425 - (view) | Author: Erlend E. Aasland (erlendaasland) * (Python triager) | Date: 2021年06月09日 14:42 | |
Isn't this just an (explicit) SIGKILL? The _exit code_ seems to be -9, not the signal number. |
|||
| msg395441 - (view) | Author: Pablo Galindo Salgado (pablogsal) * (Python committer) | Date: 2021年06月09日 17:17 | |
I am quite sure this is not a segmentation fault, Victor. |
|||
| msg395442 - (view) | Author: Pablo Galindo Salgado (pablogsal) * (Python committer) | Date: 2021年06月09日 17:18 | |
We'll wait for more builds, but for now the buildbot is green so I think this should be closed and reopened if we see it again. |
|||
| msg395455 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2021年06月09日 20:02 | |
Oh right, exit code -9 means killed by SIGKILL, it doesn't not mean killed SIGSEGV. Sorry about the confusion. How can a signal be killed by SIGKILL? Can it be related to Linux OOM Killer? Senthil: Would you mind to have a look at the server logs to see if you see anything suspicious? |
|||
| msg395456 - (view) | Author: Erlend E. Aasland (erlendaasland) * (Python triager) | Date: 2021年06月09日 20:05 | |
Oh, right, there is of course a connection between the exit code and the signal number. Thanks for the reminder :) |
|||
| msg395457 - (view) | Author: Senthil Kumaran (orsenthil) * (Python committer) | Date: 2021年06月09日 20:06 | |
Yes, this was related to the Linux OOM Killer. The agent went down shortly after this. Either multiple parallel jobs might have led to OOM or something else. I will see if logs provide more information. |
|||
| msg395483 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2021年06月09日 21:32 | |
> Yes, this was related to the Linux OOM Killer. Oh ok. Maybe you should give more memory to your worker, or you should spawn less jobs in parallel (-j1 instead of -j2). Or you should disable other services which eat memory. How much memory does it have? |
|||
| msg395488 - (view) | Author: Senthil Kumaran (orsenthil) * (Python committer) | Date: 2021年06月09日 22:20 | |
> Maybe you should give more memory to your worker, or you should spawn less jobs in parallel It was related to high number of jobs in that particular agent and result in OOM Kill from the Linux kernel - https://pastebin.com/559H4ksa The machine has 1GB Ram, but I realize that it has only one 1 CPU (This seems not optimal, minimal of 2 CPU seems to be recommendation - https://devguide.python.org/buildworker/) I will change it to run few jobs in parallel, and disable some services which are not used) and we could see again. For this, I would rather side with an agent resource issue than a compiler issue. Sorry for that. --- I also notice number unsuccessful SSH attempts on the server (today) - https://pastebin.com/ab0EKDuF The agent got unreachable probably due this, and I did reboot of the agent from the cloud console, so that I could login and see what might have happened. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:59:46 | admin | set | github: 88526 |
| 2021年06月10日 10:08:10 | pablogsal | set | status: open -> closed resolution: not a bug stage: resolved |
| 2021年06月09日 22:20:06 | orsenthil | set | messages: + msg395488 |
| 2021年06月09日 21:32:48 | vstinner | set | messages:
+ msg395483 title: test_compile killed by SIGKILL on AMD64 Ubuntu 3.x -> test_compile killed by SIGKILL on AMD64 Ubuntu 3.x (Linux OOM Killer) |
| 2021年06月09日 20:06:48 | orsenthil | set | messages: + msg395457 |
| 2021年06月09日 20:05:36 | erlendaasland | set | messages: + msg395456 |
| 2021年06月09日 20:02:24 | vstinner | set | nosy:
+ orsenthil messages: + msg395455 title: test_compile segfault on AMD64 Ubuntu 3.x -> test_compile killed by SIGKILL on AMD64 Ubuntu 3.x |
| 2021年06月09日 17:18:19 | pablogsal | set | messages: + msg395442 |
| 2021年06月09日 17:17:09 | pablogsal | set | messages: + msg395441 |
| 2021年06月09日 14:58:12 | corona10 | set | nosy:
+ corona10 |
| 2021年06月09日 14:42:39 | erlendaasland | set | messages: + msg395425 |
| 2021年06月09日 14:23:38 | pablogsal | set | messages: + msg395422 |
| 2021年06月09日 13:15:17 | vstinner | set | messages: + msg395419 |
| 2021年06月09日 11:08:01 | pablogsal | set | messages: + msg395408 |
| 2021年06月09日 08:15:32 | erlendaasland | set | nosy:
+ erlendaasland |
| 2021年06月09日 08:06:47 | vstinner | set | messages: + msg395391 |
| 2021年06月09日 08:06:22 | vstinner | create | |