homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: test_os causes delayed failure on x86 gentoo buildbot: Unknown signal 32
Type: crash Stage: resolved
Components: Tests Versions: Python 3.0, Python 3.1, Python 3.2, Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: vstinner Nosy List: mark.dickinson, neologix, nnorwitz, pitrou, r.david.murray, skrah, srid, vstinner
Priority: normal Keywords: buildbot, patch

Created on 2009年01月17日 11:09 by mark.dickinson, last changed 2022年04月11日 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
apy3.1.1-linux-x86-apy_test.log srid, 2009年08月28日 22:36 Log of test run using ActivePython 3.1.1.2
confstr_libpthread.patch vstinner, 2010年04月13日 19:20
Messages (37)
msg80009 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2009年01月17日 11:09
The x86 gentoo 3.0 and 3.x buildbots have been failing for a while at the test stage, with:
make: *** [buildbottest] Unknown signal 32
program finished with exit code 2
I noticed a common denominator with these failures, which is that they always seem to occur a few tests after 
test_os has been run. So it looks as though something in test_os is causing this.
Can anyone reproduce this (I can't on any of the machines I have access to), and (e.g., by trial and error) identify 
which of the test_os tests is causing this?
msg80010 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2009年01月17日 11:43
Another observation is that after test_os has been run, the first test to 
actually cause the 'unknown signal' failure always seems to be one that 
involves threads (e.g., test_wait3, or test_queue, or test_logging...)
msg80011 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2009年01月17日 12:24
...and there's a related message from Neal Norwitz at:
http://mail.python.org/pipermail/python-3000/2007-August/009944.html
I suspect that
python Lib/test/regrtest.py test_os test_wait3
(possibly with some additional flags to regrtest.py) should
be enough to reliably reproduce the failure.
Neal, does this help to identify the problem at all? Any suggestions 
about how to go about debugging this?
msg90390 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2009年07月10日 12:25
It would also be interesting to know whether Neal's system is using the 
LinuxThreads library, or whether it's using NPTL. If it's the former, it 
might go some way to explaining the problem.
msg90396 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2009年07月10日 20:35
I was unable to reproduce this using the suggested regrtest pair, even
if I ran -R ::, on Gentoo, kernel 2.6.30, with nptl.
msg91564 - (view) Author: Sridhar Ratnakumar (srid) Date: 2009年08月14日 17:46
[...]
test_poll
test_popen
test_poplib
stub-asunix.sh: line 238: 25474 Unknown signal 32 $PYTHON 
$installdir/lib/python?.?/test/regrtest.py -w -u all,-curses,-audio,-
network -x $SKIPS
stub: core Python test suite FAILED (retval: 160)
msg91601 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2009年08月15日 10:30
srid, I'm not sure why you added your comment; a couple of sentences 
explaining where the output you posted comes from (what machine, what 
version of Python, under what circumstances) would be really useful.
If you're able to reproduce this failure and have time to figure out where 
it's coming from, that would be fantastic.
msg92047 - (view) Author: Sridhar Ratnakumar (srid) Date: 2009年08月28日 22:36
Sorry about the late response; have been busy of late.
I believe this error ("Unknown signal 32") appears consistently in 
3.0.1, 3.1rc1, 3.1 and 3.1.1. It appears only on Linux x86. (64-bit has 
failures of different kind..)
I am attaching the entire log file.
I don't have much time to investigate into this relatively less-
important issue in detail, but if you need any further information .. 
please let me know. I will be happy to provide.
msg92048 - (view) Author: Sridhar Ratnakumar (srid) Date: 2009年08月28日 22:37
.. and here are the machine details:
apy@gila:~> uname -a
Linux gila 2.4.21-297-default #1 Sat Jul 23 07:47:39 UTC 2005 i686 i686 
i386 GNU/Linux
apy@gila:~> cat /etc/*release
LSB_VERSION="1.3"
DISTRIB_ID="SuSE"
DISTRIB_RELEASE="9.0"
DISTRIB_DESCRIPTION="SuSE Linux 9.0 (i586)"
SuSE Linux 9.0 (i586)
VERSION = 9.0
apy@gila:~>
msg92049 - (view) Author: Sridhar Ratnakumar (srid) Date: 2009年08月28日 22:39
libc used is of version 2.3.2.
***
apy@gila:~> ldd rrun/tmp/autotest/ActivePython-3.1.1.2-linux-x86/
INSTALLDIR/bin/python3
 libpthread.so.0 => /lib/i686/libpthread.so.0 (0x4002f000)
 libdl.so.2 => /lib/libdl.so.2 (0x40080000)
 libutil.so.1 => /lib/libutil.so.1 (0x40083000)
 libm.so.6 => /lib/i686/libm.so.6 (0x40086000)
 libc.so.6 => /lib/i686/libc.so.6 (0x400aa000)
 /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)
apy@gila:~> /lib/i686/libc.so.6 
GNU C Library stable release version 2.3.2 (20030827), by Roland 
McGrath et al.
Copyright (C) 2003 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Configured for i686-suse-linux.
Compiled by GNU CC version 3.3.1 (SuSE Linux).
Compiled on a Linux 2.6.0-test3 system on 2003年09月23日.
Available extensions:
 GNU libio by Per Bothner
 crypt add-on version 2.1 by Michael Glad and others
 linuxthreads-0.10 by Xavier Leroy
 NoVersion patch for broken glibc 2.0 binaries
 BIND-8.2.3-T5B
 libthread_db work sponsored by Alpha Processor Inc
 NIS(YP)/NIS+ NSS modules 0.19 by Thorsten Kukuk
Thread-local storage support included.
pthread library is compiled with floating stack support enabled.
Report bugs using the `glibcbug' script to <bugs@gnu.org>.
apy@gila:~>
msg94670 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009年10月29日 14:53
Prelude has had the same problem with signal 32:
https://dev.prelude-ids.com/issues/show/133
According to their research, it is due to the linuxthreads
implementation of the pthread API.
To know which threads implementation your glibc is using, you can run
"getconf GNU_LIBPTHREAD_VERSION" (on a modern system, it should print
something like "NPTL 2.9").
(of course, the question is, since the signal is used by linuxthreads,
why doesn't it get caught instead of killing the process?)
msg94672 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009年10月29日 15:34
Sridhar, Neal, I would advocate disabling (commenting out)
test_closerange in Lib/test/test_os.py and see what happens.
That's the one really dirty test in test_os, it might close a file
handle linuxthreads is relying on.
msg94674 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009年10月29日 15:47
Forget the last comment, test_closerange is fine...
msg102985 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2010年04月12日 22:40
This type of failure appears again in current builds:
http://www.python.org/dev/buildbot/builders/x86 gentoo 3.x/builds/2160/steps/test/logs/stdio
msg103020 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010年04月13日 08:30
> This type of failure appears again in current builds:
Unfortunately, I think you mean 'still' rather than 'again'. :)
As far as I can tell, the failure's never gone away, though it may have been obscured by other failures from time to time.
Maybe it's time to do something. I propose we:
 1. create a new branch py3k-issue4970
 2. Hack Lib/test/regrtest.py in that branch so that it runs
 *only* test_os and test_wait3, in that order (ignoring the -r
 flag). Check that we're still getting the failure.
 3. Do a binary search (remove half the test_os tests; trigger
 buildbot run; see if we're still getting the signal; rinse;
 repeat) to narrow down the cause to a particular test.
 4. While doing 3, ruthlessly kill all other non-trunk
 checkin-triggered buildbot runs on this machine
 to speed up the search process a bit. (Keeping trunk builds
 for the sake of the upcoming 2.7 release.)
I'll try to start this this evening (no ssh access at the moment) unless someone else beats me to it.
msg103021 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010年04月13日 08:34
Should also modify regrtest to print out the result of the command
getconf GNU_LIBPTHREAD_VERSION
that Antoine suggested.
msg103033 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2010年04月13日 10:50
This bugreport http://bugs.gentoo.org/28193 indeed suggests that
the failure occurs on systems without nptl.
Would it be possible for someone with an affected system to run
the test program from the bug report?
msg103039 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2010年04月13日 12:03
Signal 32 is the first real-time signal, and is indeed used by linuxthreads, so it's very likely a linuxthreads bug, since this signal shouldn't leak to application.
Since linuxthreads is no longer maintained, I'm afraid we can't do much about this, except check for the threading library used and say "linuxthreads is obsolete and has known issues - please upgrade your system for reliable threading support".
msg103040 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010年04月13日 12:06
> Since linuxthreads is no longer maintained, I'm afraid we can't do much
> about this.
Agreed. But I think it's still worth trying to narrow down (and possibly work around) the cause of failure, just so that we can make this buildbot useful again on 3.x. Perhaps we can get by with a conditional skip of one of the test_os tests, but we have to figure out which one first. :)
msg103043 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010年04月13日 12:24
Extract of the Prelude ticket https://dev.prelude-ids.com/issues/show/133 : "commenting out sigprocmask(SIG_SETMASK, &set, NULL) seems to fixes the problem (...)"
msg103046 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010年04月13日 13:20
Results of my simple-minded strategy (see r80033-4, r80037, r80042, r80045, r80047-51):
test_execvpe_with_bad_program in ExecTests by itself is enough to trigger the signal 32 error (in combination with test_wait3). See:
http://www.python.org/dev/buildbot/builders/x86%20gentoo%203.x/builds/2174
If just this single test is disabled and all other tests in test_os are allowed to run, there's no problem (at least with test_wait3; (I haven't tried re-enabling *all* other tests in the test suite yet). See:
http://www.python.org/dev/buildbot/builders/x86%20gentoo%203.x/builds/2176 
msg103055 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010年04月13日 15:16
Here's some fairly minimal Python code that produces the signal:
### begin example ###
import os
import time
import _thread
try:
 os.execv('/usr/bin/dorothyq', ['dorothyq'])
except OSError:
 pass
def f():
 time.sleep(1.0) # probably irrelevant to the failure
_thread.start_new(f, ())
### end example ###
It looks as though the failed os.execv call messes something up internally, so that any attempt thereafter to start a thread produces this signal. I can't see anything obviously wrong with the os.execv implementation (see posix_execv in Modules/posixmodule.c).
There's still the question of what changed between 2.x and 3.x: on 2.x, this buildbot seems perfectly happy.
msg103057 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010年04月13日 15:52
Okay, I think I've got as far as I can, but if anyone else wants to hack on this, please do.
The branch name is py3k-issue4970
In that branch:
 - there's an extra test Lib/test/test_issue4970 that demonstrates
 the signal 32 failure
 - Lib/test/regrtest.py has been hacked to run only that test,
 in verbose mode.
msg103061 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010年04月13日 16:52
signal.__dict__:
{...
 'NSIG': 65,
 'SIGABRT': 6,
 'SIGALRM': 14,
 'SIGBUS': 7,
 'SIGCHLD': 17,
 'SIGCLD': 17,
 'SIGCONT': 18,
 'SIGFPE': 8,
 'SIGHUP': 1,
 'SIGILL': 4,
 'SIGINT': 2,
 'SIGIO': 29,
 'SIGIOT': 6,
 'SIGKILL': 9,
 'SIGPIPE': 13,
 'SIGPOLL': 29,
 'SIGPROF': 27,
 'SIGPWR': 30,
 'SIGQUIT': 3,
 'SIGRTMAX': 64,
 'SIGRTMIN': 35,
 'SIGSEGV': 11,
 'SIGSTOP': 19,
 'SIGSYS': 31,
 'SIGTERM': 15,
 'SIGTRAP': 5,
 'SIGTSTP': 20,
 'SIGTTIN': 21,
 'SIGTTOU': 22,
 'SIGURG': 23,
 'SIGUSR1': 10,
 'SIGUSR2': 12,
 'SIGVTALRM': 26,
 'SIGWINCH': 28,
 'SIGXCPU': 24,
 'SIGXFSZ': 25,
 'SIG_DFL': 0,
 'SIG_IGN': 1,
 ...}
msg103068 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010年04月13日 17:24
There are many references to "unknown signal 32" errors in Google.
Gdb mailing list, December 2002: "SIG32/SIGTRAP issues"
http://sources.redhat.com/ml/gdb/2002-12/msg00057.html
Gdb mailing list, September 2003: "pthread_create, Program received signal ?, Unknown signal"
http://sources.redhat.com/ml/gdb/2003-09/msg00003.html
=> extract: "A change in the definition of SIGRTMIN
causes this symptom."
There is a thread "SIGRT_0 (Unknown signal 32)" in the Linux Kernel mailing list, in July 2005:
http://lkml.org/lkml/2005/7/30/93
Extract of a Debian bug report:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=298982
-------------
There are actually three earlier instances in the trace of
 rt_sigsuspend([] <unfinished ...>
in those cases it gets "SIGRTMIN (Unknown signal 32)" and carries on.
The one prior to the hanging instance is
 open("/dev/sequencer", O_WRONLY) = 10
 ioctl(10, SNDCTL_SEQ_NRSYNTHS, 0x40095b20) = 0
 ioctl(10, SNDCTL_SEQ_NRMIDIS, 0x40094c20) = 0
 rt_sigprocmask(SIG_SETMASK, NULL, [RTMIN], 8) = 0
 write(9, " 357円24円@0円0円0円0円P370円377円277円240円341円16円@220円376円23円10円"..., 148) = 148
 rt_sigprocmask(SIG_SETMASK, NULL, [RTMIN], 8) = 0
 rt_sigsuspend([] <unfinished ...>
 --- SIGRTMIN (Unknown signal 32) @ 0 (0) ---
 <... rt_sigsuspend resumed> ) = -1 EINTR (Interrupted system call)
 sigreturn() = ? (mask now [RTMIN])
(...)
This is not very helpful, as it means the thread is waiting for another
thread, nothing else. Could you run it with strace -f ?
-------------
msg103069 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010年04月13日 17:47
NPTL was introduced in Linux kernel 2.6(.0). glibc 2.4 requires NPTL:
"The LinuxThreads add-on, providing pthreads on Linux 2.4 kernels, is no longer supported. The new NPTL implementation requires Linux 2.6 kernels. For a libc and libpthread that works well on Linux 2.4 kernels, we recommend using the stable 2.3 branch."
NPTL 0.1 was released in September 2002. So the bug requires a Linux kernel 2.4 (and gblic 2.3.x).
msg103071 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010年04月13日 17:55
I suggest simply skipping the "offending" test on linuxthread platforms.
(perhaps as simple as checking for sys.platform == "linux2" and signal.SIGRTMIN == 35)
msg103076 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010年04月13日 19:20
> I suggest simply skipping the "offending" test on linuxthread
> platforms.
Good idea
> (perhaps as simple as checking for sys.platform == "linux2" 
> and signal.SIGRTMIN == 35)
I would prefer to rely on confstr():
import os
try:
 # 'linuxthreads-0.10' or 'NPTL 2.10.2'
 pthread = os.confstr("CS_GNU_LIBPTHREAD_VERSION")
 linuxthreads = pthread.startswith("linuxthreads")
except ValueError:
 linuxthreads = False
^^ this example requires attached patch for the two CS_GNU_* constants.
Which tests should be disabled?
msg103077 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010年04月13日 19:33
Skipping test_execvpe_with_bad_program sounds good to me.
I'd ideally like to understand why 3.x is failing where 2.x is happy, but life's too short to stuff a mushroom.
msg103079 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2010年04月13日 19:59
> It looks as though the failed os.execv call messes something up internally, so that any attempt thereafter to start a thread produces this signal. I can't see anything obviously wrong with the os.execv implementation (see posix_execv in Modules/posixmodule.c).
Upon execve, signals handler are reset to default. So maybe the error makes the linuxthread API screw up latter when it tries to set up handlers for SIGRTMIN and friend. But what's weird is that when the executable given does not exist, the call should fail and return before having done anything...
> There's still the question of what changed between 2.x and 3.x: on 2.x, this buildbot seems perfectly happy.
I think it's simply because we didn't test a wrong program path with execve in 2.X version of test_os.
msg103080 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010年04月13日 20:04
> I think it's simply because we didn't test a wrong program path 
> with execve in 2.X version of test_os.
Oh, we should add this test to Python2 ;-)
msg103081 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010年04月13日 20:07
> I think it's simply because we didn't test a wrong program path with execve in 2.X version of test_os.
D'oh! Thank you very much.
I'm happy now: my mushroom's stuffed. :)
msg103110 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2010年04月14日 02:23
Thanks for taking care of this guys. Sorry, I got swamped with mail
and had to archive 3,000+ messages. It looks like it's in good hands.
Let me know if there's anything you need. I may not have access to
the box anymore, however, I can always contact Kurt.
msg103136 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010年04月14日 17:36
Victor, that patch looks fine to me. Do you want to go ahead and apply it, and add the skip to test_execvpe_with_bad_program ?
The fix should be backported to 3.1, but not to 2.x (I think), since we don't have a problem there, and arguably the new os.confstr items could be considered a new feature.
msg103137 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010年04月14日 17:39
And just for reference, 
http://www.python.org/dev/buildbot/builders/x86%20gentoo%203.x/builds/2192
shows that the relevant versions on this machine are:
glibc 2.3.4
linuxthreads-0.10
and from http://www.python.org/dev/buildbot/builders/x86%20gentoo%203.x, it looks like the kernel version is 2.6.9 (actually, 2.6.9-gentoo-r1).
msg103316 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010年04月16日 12:26
Commited as r80108 to py3k: "Add CS_GNU_LIBC_VERSION and CS_GNU_LIBPTHREAD_VERSION constants for constr(), and disable test_execvpe_with_bad_program() of test_os if the libc uses linuxthreads to avoid the "unknown signal 32" bug (see issue #4970)."
Wait for the buildbot to port it to trunk (and maybe 2.6 and 3.1).
msg103339 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010年04月16日 16:34
Fix merged to release31-maint in r80119. Thanks, Victor.
History
Date User Action Args
2022年04月11日 14:56:44adminsetgithub: 49220
2010年04月16日 16:34:46mark.dickinsonsetstatus: open -> closed

components: + Tests
type: crash
nosy: nnorwitz, mark.dickinson, pitrou, vstinner, r.david.murray, srid, skrah, neologix
messages: + msg103339
resolution: fixed
stage: resolved
2010年04月16日 12:26:30vstinnersetmessages: + msg103316
2010年04月14日 17:39:28mark.dickinsonsetassignee: vstinner
messages: + msg103137
2010年04月14日 17:36:44mark.dickinsonsetmessages: + msg103136
2010年04月14日 02:23:14nnorwitzsetmessages: + msg103110
2010年04月13日 20:07:38mark.dickinsonsetmessages: + msg103081
2010年04月13日 20:04:12vstinnersetmessages: + msg103080
2010年04月13日 19:59:58neologixsetmessages: + msg103079
2010年04月13日 19:33:21mark.dickinsonsetmessages: + msg103077
2010年04月13日 19:20:57vstinnersetfiles: + confstr_libpthread.patch
keywords: + patch
messages: + msg103076
2010年04月13日 17:55:26pitrousetmessages: + msg103071
2010年04月13日 17:47:34vstinnersetmessages: + msg103069
2010年04月13日 17:24:25vstinnersetmessages: + msg103068
2010年04月13日 16:52:42vstinnersetmessages: + msg103061
2010年04月13日 15:52:08mark.dickinsonsetmessages: + msg103057
2010年04月13日 15:16:40mark.dickinsonsetmessages: + msg103055
2010年04月13日 13:20:33mark.dickinsonsetmessages: + msg103046
2010年04月13日 12:24:45vstinnersetnosy: + vstinner
messages: + msg103043
2010年04月13日 12:06:34mark.dickinsonsetmessages: + msg103040
2010年04月13日 12:03:27neologixsetnosy: + neologix
messages: + msg103039
2010年04月13日 10:50:29skrahsetmessages: + msg103033
2010年04月13日 08:34:10mark.dickinsonsetmessages: + msg103021
2010年04月13日 08:30:23mark.dickinsonsetmessages: + msg103020
2010年04月12日 22:40:24skrahsetnosy: + skrah

messages: + msg102985
versions: + Python 3.2, Python 3.3
2009年10月30日 19:45:53r.david.murraysetkeywords: + buildbot
2009年10月29日 15:47:42pitrousetmessages: + msg94674
2009年10月29日 15:34:55pitrousetmessages: + msg94672
2009年10月29日 14:53:11pitrousetnosy: + pitrou
messages: + msg94670
2009年08月28日 22:39:46sridsetmessages: + msg92049
2009年08月28日 22:37:50sridsetmessages: + msg92048
2009年08月28日 22:36:54sridsetfiles: + apy3.1.1-linux-x86-apy_test.log

messages: + msg92047
2009年08月15日 10:30:00mark.dickinsonsetmessages: + msg91601
2009年08月14日 17:46:13sridsetnosy: + srid
messages: + msg91564
2009年07月10日 20:35:08r.david.murraysetpriority: normal

messages: + msg90396
2009年07月10日 12:25:48mark.dickinsonsetmessages: + msg90390
2009年07月10日 11:33:26r.david.murraysetnosy: + r.david.murray
2009年01月17日 12:24:25mark.dickinsonsetnosy: + nnorwitz
messages: + msg80011
2009年01月17日 11:43:36mark.dickinsonsetmessages: + msg80010
2009年01月17日 11:09:45mark.dickinsoncreate

AltStyle によって変換されたページ (->オリジナル) /