This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2011年01月04日 14:44 by pitrou, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| spdoors.patch | pitrou, 2011年01月29日 10:48 | |||
| Messages (10) | |||
|---|---|---|---|
| msg125326 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2011年01月04日 14:44 | |
This happens selectively (and intermittently) on the Solaris buildbot:
test_pass_fds (test.test_subprocess.POSIXProcessTestCase) ... -- maxfd = 256
[36787 refs]
-- fds that should have been closed: {5, 6, 7, 8, 9, 10, 11, 12, 13}
-- fds that remained open: {0, 1, 2, 3, 5}
FAIL
test_pass_fds (test.test_subprocess.ProcessTestCasePOSIXPurePython) ... -- maxfd = 256
[36787 refs]
-- fds that should have been closed: {5, 6, 7, 8, 9, 10, 11, 12, 13}
-- fds that remained open: {0, 1, 2, 3, 5}
FAIL
As you see (thanks to debug output), all fds have been closed except number 5.
(http://www.python.org/dev/buildbot/all/builders/sparc%20solaris10%20gcc%203.x/builds/2466/steps/test/logs/stdio)
|
|||
| msg125329 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2011年01月04日 14:55 | |
Another (perhaps more likely) possibility is that fd 5 was properly closed, but another one created at startup by the child Python interpreter. How could we diagnose that? |
|||
| msg125358 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2011年01月04日 19:10 | |
Happened also on the OpenIndiana buildbot: http://www.python.org/dev/buildbot/all/builders/x86%20OpenIndiana%203.x/builds/492/steps/test/logs/stdio |
|||
| msg127060 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2011年01月25日 21:55 | |
So, I added some debug info to test_pass_fds:
-- fds that should have been closed: {5, 6, 7, 8, 9, 10, 11, 12, 13}
-- fds that remained open: {0, 1, 2, 3, 5}
-- debug info:
0 8194 posix.stat_result(st_mode=8592, st_ino=12582920, st_dev=82051072, st_nlink=1, st_uid=1140, st_gid=7, st_size=0, st_atime=1295990436, st_mtime=1295991135, st_ctime=1295990436)
1 2 posix.stat_result(st_mode=4096, st_ino=79119887, st_dev=84148224, st_nlink=0, st_uid=1140, st_gid=1, st_size=0, st_atime=1295991135, st_mtime=1295991135, st_ctime=1295991135)
2 8194 posix.stat_result(st_mode=8592, st_ino=12582920, st_dev=82051072, st_nlink=1, st_uid=1140, st_gid=7, st_size=0, st_atime=1295990436, st_mtime=1295991135, st_ctime=1295990436)
3 2 posix.stat_result(st_mode=4096, st_ino=79119882, st_dev=84148224, st_nlink=0, st_uid=1140, st_gid=1, st_size=0, st_atime=1295991135, st_mtime=1295991135, st_ctime=1295991135)
5 8192 posix.stat_result(st_mode=53540, st_ino=56, st_dev=84410368, st_nlink=1, st_uid=0, st_gid=0, st_size=0, st_atime=1269710532, st_mtime=1269710532, st_ctime=1269710532)
FAIL
Each line is in the following form: <fd> <value returned by F_GETFD> <os.fstat(fd)>
Here is my interpretation for fd 5 (which is the offending fd):
- a F_GETFD value of 8192 is different from both pipes (which return 2) and standard stdin/stderr (which return 8194); if the values haven't changed between Solaris versions, 8192 is O_LARGEFILE + O_RDONLY
- a st_mode of 53540 is stat.S_IFSOCK + stat.S_IFIFO + stat.S_IRUSR + stat.S_IRGRP + stat.S_IROTH; so this seems to be some kind of read-only (filesystem-based?) socket
I think it is unlikely that this socket is inherited from the parent process, or present before calling exec(). Instead, perhaps it's the startup of the child Python interpreter (after exec()) which creates such a file descriptor, and doesn't close it. Unfortunately, it seems difficult to diagnose this in more detail.
|
|||
| msg127063 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2011年01月25日 22:51 | |
I managed to get further debug info from the OpenIndiana buildbot:
-- maxfd = 65536
-- fds that should have been closed: {5, 6, 7, 8, 9, 10, 11, 12, 13}
-- fds that remained open: {0, 1, 2, 3, 5}
-- debug info:
0 2 posix.stat_result(st_mode=4096, st_ino=38002789, st_dev=142868480, st_nlink=0, st_uid=101, st_gid=101, st_size=0, st_atime=1295994652, st_mtime=1295994652, st_ctime=1295994652)
1 2 posix.stat_result(st_mode=4096, st_ino=38019207, st_dev=142868480, st_nlink=0, st_uid=101, st_gid=101, st_size=0, st_atime=1295994988, st_mtime=1295994988, st_ctime=1295994988)
2 2 posix.stat_result(st_mode=4096, st_ino=38002791, st_dev=142868480, st_nlink=0, st_uid=101, st_gid=101, st_size=0, st_atime=1295994987, st_mtime=1295994987, st_ctime=1295994987)
3 2 posix.stat_result(st_mode=4096, st_ino=38019202, st_dev=142868480, st_nlink=0, st_uid=101, st_gid=101, st_size=0, st_atime=1295994988, st_mtime=1295994988, st_ctime=1295994988)
5 8192 posix.stat_result(st_mode=53540, st_ino=210, st_dev=146014208, st_nlink=1, st_uid=0, st_gid=0, st_size=0, st_atime=1290335283, st_mtime=1290335283, st_ctime=1290335283)
total 2
dr-x------ 2 buildbot buildbot 528 Jan 25 22:36 .
dr-x--x--x 5 buildbot buildbot 864 Jan 25 22:36 ..
p--------- 0 buildbot buildbot 0 Jan 25 22:30 0
p--------- 0 buildbot buildbot 0 Jan 25 22:36 1
p--------- 0 buildbot buildbot 0 Jan 25 22:36 2
p--------- 0 buildbot buildbot 0 Jan 25 22:36 3
D--------- 1 root root 0 Nov 21 10:28 5
File: `/proc/22816/fd/5'
Size: 0 Blocks: 0 IO Block: 0 weird file
Device: 8b40000h/146014208d Inode: 210 Links: 1
Access: (0000/D---------) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2010年11月21日 10:28:03.659679536 +0000
Modify: 2010年11月21日 10:28:03.659679536 +0000
Change: 2010年11月21日 10:28:03.659679536 +0000
So, the offending fd (5) points to a "weird file" (!!) created in November 2010... It definitely can't be the same file as pointed to by fd 5 in the parent process, since that is a standard anonymous pipe. This reinforces the idea that starting up the child Python interpreter sometimes creates that fd and doesn't close it.
Also, while the sparc solaris buildbot doesn't have the "stat" command, it still displays the following debug output:
total 4
dr-x------ 2 buildbot other 528 Jan 25 23:58 .
dr-x--x--x 5 buildbot other 832 Jan 25 23:58 ..
c--------- 1 buildbot tty 24, 2 Jan 25 23:58 0
p--------- 0 buildbot other 0 Jan 25 23:58 1
c--------- 1 buildbot tty 24, 2 Jan 25 23:58 2
p--------- 0 buildbot other 0 Jan 25 23:58 3
D--------- 1 root root 0 Mar 27 2010 5
... where fd number 5 seems to be the same kind of "weird file" created by root a long time ago. Martin, does that date (Mar 27 2010) ring a bell? Is it when the system was installed? When the buildslave was started?
|
|||
| msg127098 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2011年01月26日 10:41 | |
'D' denotes a door file. There are several of them in the system; the following ones (from /var/run) match the time stamp: Dr--r--r-- 1 daemon daemon 0 Mar 27 2010 kcfd_door Dr--r--r-- 1 root root 0 Mar 27 2010 name_service_door Dr--r--r-- 1 root root 0 Mar 27 2010 picld_door Drw-r--r-- 1 root root 0 Mar 27 2010 syslog_door It's common for library routines to open a door and keep it open for the entire process lifetime. If this was somehow reproducible, I could try to find out what specific door gets opened. |
|||
| msg127099 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2011年01月26日 10:48 | |
> 'D' denotes a door file. There are several of them in the system; the > following ones (from /var/run) match the time stamp: Thanks for the explanation! > It's common for library routines to open a door and keep it open for > the entire process lifetime. > > If this was somehow reproducible, I could try to find out what > specific door gets opened. test_subprocess fails quite often on your buildbot, so with a bit of patience you can probably reproduce it for sure. If you use "-r --randseed 2463396" it will schedule test_subprocess quite early, so you can stop the test run as soon as that test is finished. Also, debug output has been added in branches/test_subprocess_10826 : http://www.python.org/dev/buildbot/all/builders/sparc%20solaris10%20gcc%203.x/builds/2574/steps/test/logs/stdio |
|||
| msg127399 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2011年01月29日 10:48 | |
Here is a patch to ignore door files when running the fd_status script. It seems to eradicate the intermittent failures on the solaris buildbots. |
|||
| msg128035 - (view) | Author: Gregory P. Smith (gregory.p.smith) * (Python committer) | Date: 2011年02月06日 01:05 | |
Your patch makes sense to me. I'll commit it after the 3.2 release for 3.2.1. The elease manager can feel free to commit it earlier if it is bothering an important buildbot. |
|||
| msg129001 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2011年02月21日 21:59 | |
Committed in r88484 (3.x), r88485 (3.2). |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:10 | admin | set | github: 55035 |
| 2011年02月21日 21:59:10 | pitrou | set | status: open -> closed nosy: loewis, gregory.p.smith, pitrou messages: + msg129001 resolution: fixed stage: resolved |
| 2011年02月06日 01:05:31 | gregory.p.smith | set | assignee: gregory.p.smith messages: + msg128035 nosy: loewis, gregory.p.smith, pitrou |
| 2011年01月29日 10:48:53 | pitrou | set | files:
+ spdoors.patch messages: + msg127399 keywords: + patch nosy: loewis, gregory.p.smith, pitrou |
| 2011年01月26日 10:48:09 | pitrou | set | nosy:
loewis, gregory.p.smith, pitrou messages: + msg127099 |
| 2011年01月26日 10:41:17 | loewis | set | nosy:
loewis, gregory.p.smith, pitrou messages: + msg127098 |
| 2011年01月25日 22:51:24 | pitrou | set | nosy:
+ loewis messages: + msg127063 |
| 2011年01月25日 21:55:43 | pitrou | set | messages: + msg127060 |
| 2011年01月04日 19:10:23 | pitrou | set | messages: + msg125358 |
| 2011年01月04日 14:55:07 | pitrou | set | messages: + msg125329 |
| 2011年01月04日 14:44:49 | pitrou | create | |