I looked in the questions previously asked and didn't find a similar one. I missed one, please let me know and I apologize for asking the same question again.
I'm debugging an issue in which I have a Popen hang when executing the following code (python 2.6):
#!/usr/bin/env python
import subprocess
cmd = 'service drbd start'
rc = subprocess.call(cmd.split(), stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
This is also happening if I use the "call" blocking method instead. This issue happens on this one node only for now. It runs Suse SLES 11 SP1.
The strace on the failed node shows the following:
<snip>
fstat(5, {st_mode=S_IFREG|0644, st_size=22808, ...}) = 0
open("/usr/lib64/python2.6/lib-dynload/cStringIO.so", O_RDONLY) = 6
read(6, "177円ELF2円1円1円0円0円0円0円0円0円0円0円0円3円0円>0円1円0円0円0円0円34円0円0円0円0円0円0円"..., 832) =832
fstat(6, {st_mode=S_IFREG|0644, st_size=22808, ...}) = 0
mmap(NULL, 2117912, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 6, 0) = 0x7f0cd5987000
fadvise64(6, 0, 2117912, POSIX_FADV_WILLNEED) = 0
mprotect(0x7f0cd598b000, 2093056, PROT_NONE) = 0
mmap(0x7f0cd5b8a000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 6, 0x3000) = 0x7f0cd5b8a000
close(6) = 0
mprotect(0x7f0cd5b8a000, 4096, PROT_READ) = 0
close(5) = 0
close(4) = 0
getrlimit(RLIMIT_NOFILE, {rlim_cur=1024, rlim_max=1024}) = 0
close(3) = 0
pipe([3, 4]) = 0
pipe([5, 6]) = 0
fcntl(6, F_GETFD) = 0
fcntl(6, F_SETFD, FD_CLOEXEC) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0cd793e9d0) = 27688
close(6) = 0
close(4) = 0
mmap(NULL, 1052672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f0cd5886000
read(5, "", 1048576) = 0
mremap(0x7f0cd5886000, 1052672, 4096, MREMAP_MAYMOVE) = 0x7f0cd5886000
close(5) = 0
munmap(0x7f0cd5886000, 4096) = 0
fcntl(3, F_GETFL) = 0 (flags O_RDONLY)
fstat(3, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f0cd794a000
lseek(3, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek)
fstat(3, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
munmap(0x7f0cd794a000, 4096) = 0
wait4(27688,
It hangs right here. The corresponding process is:
root 27688 27659 0 15:34 pts/2 00:00:00 /bin/sh /sbin/service drbd start
root 27690 27688 0 15:34 pts/2 00:00:00 /bin/bash /etc/init.d/drbd start
Here is the output on another node on which this same code works:
<snip>
fstat(5, {st_mode=S_IFREG|0644, st_size=22808, ...}) = 0
open("/usr/lib64/python2.6/lib-dynload/cStringIO.so", O_RDONLY) = 6
read(6, "177円ELF2円1円1円0円0円0円0円0円0円0円0円0円3円0円>0円1円0円0円0円0円34円0円0円0円0円0円0円"..., 832) = 832
fstat(6, {st_mode=S_IFREG|0644, st_size=22808, ...}) = 0
mmap(NULL, 2117912, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 6, 0) = 0x7f57b3833000
fadvise64(6, 0, 2117912, POSIX_FADV_WILLNEED) = 0
mprotect(0x7f57b3837000, 2093056, PROT_NONE) = 0
mmap(0x7f57b3a36000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 6, 0x3000) = 0x7f57b3a36000
close(6) = 0
mprotect(0x7f57b3a36000, 4096, PROT_READ) = 0
close(5) = 0
close(4) = 0
getrlimit(RLIMIT_NOFILE, {rlim_cur=1024, rlim_max=1024}) = 0
close(3) = 0
pipe([3, 4]) = 0
pipe([5, 6]) = 0
fcntl(6, F_GETFD) = 0
fcntl(6, F_SETFD, FD_CLOEXEC) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f57b57ea9d0) = 32277
close(6) = 0
close(4) = 0
mmap(NULL, 1052672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f57b3732000
read(5, "", 1048576) = 0
mremap(0x7f57b3732000, 1052672, 4096, MREMAP_MAYMOVE) = 0x7f57b3732000
close(5) = 0
munmap(0x7f57b3732000, 4096) = 0
fcntl(3, F_GETFL) = 0 (flags O_RDONLY)
fstat(3, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f57b57f6000
lseek(3, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek)
fstat(3, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
munmap(0x7f57b57f6000, 4096) = 0
wait4(32277, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 32277
--- SIGCHLD (Child exited) @ 0 (0) ---
close(3) = 0
rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x7f57b50386a0}, {0x7f57b52d88f8, [], SA_RESTORER, 0x7f57b50386a0}, 8) = 0
exit_group(0) = ?
I can't figure out what's wrong with this node? Why does this command hang on the first node and not on the second one (running exactly the same OS and python versions)?
Any help would be greatly appreciated.
Thank you in advance,
D.
PS. If anyone wants the complete strace output from either node, please let me know. I'm allowed to keep this machine in this state only until Monday :-(
-
Here is some extra data about the hang process above (27688): root# l /proc/27688/fd total 0 dr-x------ 2 root root 0 Dec 6 15:38 ./ dr-xr-xr-x 7 root root 0 Dec 6 15:34 ../ lrwx------ 1 root root 64 Dec 6 16:00 0 -> /dev/pts/2 l-wx------ 1 root root 64 Dec 6 16:00 1 -> pipe:[5852827] l-wx------ 1 root root 64 Dec 6 15:38 2 -> pipe:[5852827] lr-x------ 1 root root 64 Dec 6 16:00 255 -> /sbin/service*Djurdjura H.– Djurdjura H.2012年12月06日 21:03:26 +00:00Commented Dec 6, 2012 at 21:03
-
I have seen the same thing when starting daemons using subprocess, but it does not happen with all of them. My guess is that the process that forks the daemon is not ended properly or does not close its stdout/stderr.Christian– Christian2018年07月10日 11:12:55 +00:00Commented Jul 10, 2018 at 11:12
1 Answer 1
It's wrong to use subprocess.call(...stdout=PIPE). This will redirect stdout to a pipe, but this pipe is not being read by the current process, so the subprocess will block after outputting more than N bytes of data (with N depending on the OS).
What you want is subprocess.check_output().
6 Comments
p = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE, stderr=subprocess.STDOUT) res = p.communicate() This command hangs (I also tried using subprocess.PIPE for stderr with same results). When run directly in a shell, it returns the following: Starting DRBD resources: [ ].ls -l, it works just fine. More interesting is if I use the following command service drbd status or even service drbd stop these also work. What's so special with running 'service drbd start'?check_output() function in the subprocess module.p.communicate() version looks fine (it's what check_output() does internally too). I don't know what is special about service drbd start, sorry...