homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Buildbot reliability
Type: behavior Stage: resolved
Components: Tests Versions:
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: barry, ned.deily, pitrou, skrah, vstinner
Priority: normal Keywords: buildbot

Created on 2011年04月30日 06:43 by skrah, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
freebsd-amd64-log.txt skrah, 2011年05月02日 18:23
Messages (10)
msg134839 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011年04月30日 06:43
The FreeBSD-AMD64 bot exhibits sporadic hanging in unspecific places.
FreeBSD is running under kvm in the background. When the hanging occurs,
the virtual machine uses 100% CPU and I can't log in via ssh, so I have
to kill the kvm process.
The fact that the ssh login fails if a user process is misbehaving
seems like a FreeBSD/kvm issue to me. However, this problem did not
occur when I set up the bot a couple of weeks ago.
I've started a series of older revision builds to see if anything
recent causes this.
msg134890 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年04月30日 23:15
> The FreeBSD-AMD64 bot exhibits sporadic hanging in unspecific places.
You can try a shorter regrtest timeout, edit Lib/test/regrtest.py near:
 if hasattr(faulthandler, 'dump_tracebacks_later'):
 timeout = 60*60
(or use --timeout option of the regrtest.py program)
If you have an access to a terminal (using ssh), you can also set a signal to dump the traceback: edit regrtest.py to add "import signal; faulthandler.register(signal.SIGUSR1, all_threads=True)" after "faulthandler.enable()". Then use "kill -USR1 pid" to dump the traceback.
Or the problem is an unlimited loop while dumping the traceback because of a timeout :-D In this case, disable the timeout using --timeout=0 option of regrtest.py.
msg134901 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011年05月01日 06:03
Thanks Victor, I can try some of that.
Could this also be a problem with the buildbot software or a networking
problem? The Ubuntu PPC bot might have the same issue. Here the tests
appear to be finished but the clean doesn't start:
http://www.python.org/dev/buildbot/all/builders/PPC%20Ubuntu%203.1/builds/387/steps/test/logs/stdio
http://www.python.org/dev/buildbot/all/builders/PPC%20Ubuntu%203.1/builds/387 
msg134922 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2011年05月01日 19:36
That might be another instance of this:
 http://thread.gmane.org/gmane.comp.python.devel/123698
You might want to bring this up on python-dev.
msg134997 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011年05月02日 18:23
Going through the logs, this indeed looks like a buildbot software
issue to me. I attach the logs that correspond to this incident:
http://www.python.org/dev/buildbot/all/builders/AMD64%20FreeBSD%208.2%203.2/builds/85
After ...
2011年04月30日 01:10:56+0200 [Broker,client] closing stdin
2011年04月30日 01:10:56+0200 [Broker,client] using PTY: False
... normally you should see:
... [-] command finished with signal None, exit code 0, elapsedTime:
But there is nothing until I restarted the bot.
msg135084 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011年05月03日 22:15
Another instance:
2011年05月03日 20:18:08+0200 [Broker,client] closing stdin
2011年05月03日 20:18:08+0200 [Broker,client] using PTY: False
2011年05月03日 20:20:38+0200 [-] sending app-level keepalive
Again this is missing:
... [-] command finished with signal None, exit code 0, elapsedTime:
Also, as we speak the Ubuntu PPC bot is hanging as well:
http://www.python.org/dev/buildbot/all/builders/PPC%20Ubuntu%202.7/builds/386/steps/test/logs/stdio
Antoine, do you have access to the server logs for the relevant
times? My bot is on CEST.
msg135085 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2011年05月03日 22:40
My Ubuntu PPC server is having hardware problems. It will just intermittently shut off. I've reset the SMU and the PRAM, vacuumed out the guts, reseated the RAM, pulled any possibly problematic 3rd party boards, and it still crashes. I was watching the syslog and it didn't look like a thermal shutdown, though it acted like that. The only thing I can think of is a power supply problem, so I'm going to see if I can find an inexpensive replacement. In the meantime, this machine will be offline for a couple of weeks at least.
msg135174 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011年05月05日 07:10
The FreeBSD bot had these error messages in the log files:
1) kernel: swap_pager: indefinite wait buffer: device
2) Approaching the limit on PV entries, consider increasing either the vm.pmap.shpgperproc or the vm.pmap.p
v_entry_max sysctl.
I set up the bot from scratch with these changes:
a) Use swap partition (2GB) instead of swap file (2 GB).
b) Use these sysctls:
 kern.ipc.shm_use_phys=1
 vm.pmap.shpgperproc=4096
 vm.pmap.pv_entry_max=16777216
c) Use self-compiled Python2.7 instead of the system Python2.6.
Let's see how that works out. Error 1) is bad, perhaps FreeBSD
does not play well with the qcow2 file system under high load.
msg135175 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011年05月05日 07:36
On second thought, I don't want to debug possible qcow2 issues, so
I made another change:
d) Use raw format for the image.
msg135421 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011年05月07日 09:06
I think the FreeBSD bot changes are working out fine. The Ubuntu-PPC
issues were unrelated, so I'm closing this.
History
Date User Action Args
2022年04月11日 14:57:16adminsetgithub: 56171
2011年05月07日 09:06:55skrahsetstatus: open -> closed
messages: + msg135421

keywords: + buildbot
resolution: fixed
stage: resolved
2011年05月05日 07:36:05skrahsetmessages: + msg135175
2011年05月05日 07:10:24skrahsetmessages: + msg135174
2011年05月03日 22:40:58barrysetmessages: + msg135085
2011年05月03日 22:17:09skrahsetnosy: + barry
2011年05月03日 22:15:40skrahsetmessages: + msg135084
title: FreeBSD-AMD64 bot sporadic hanging -> Buildbot reliability
2011年05月02日 18:23:41skrahsetfiles: + freebsd-amd64-log.txt

messages: + msg134997
2011年05月01日 19:36:41ned.deilysetnosy: + ned.deily
messages: + msg134922
2011年05月01日 06:03:43skrahsetmessages: + msg134901
2011年04月30日 23:15:44vstinnersetnosy: + vstinner
messages: + msg134890
2011年04月30日 06:43:10skrahcreate

AltStyle によって変換されたページ (->オリジナル) /