0

I looked in the questions previously asked and didn't find a similar one. I missed one, please let me know and I apologize for asking the same question again.

I'm debugging an issue in which I have a Popen hang when executing the following code (python 2.6):

#!/usr/bin/env python
import subprocess
cmd = 'service drbd start'
rc = subprocess.call(cmd.split(), stdout=subprocess.PIPE, stderr=subprocess.STDOUT)

This is also happening if I use the "call" blocking method instead. This issue happens on this one node only for now. It runs Suse SLES 11 SP1.

The strace on the failed node shows the following:

<snip>
fstat(5, {st_mode=S_IFREG|0644, st_size=22808, ...}) = 0
open("/usr/lib64/python2.6/lib-dynload/cStringIO.so", O_RDONLY) = 6
read(6, "177円ELF2円1円1円0円0円0円0円0円0円0円0円0円3円0円>0円1円0円0円0円0円34円0円0円0円0円0円0円"..., 832) =832
fstat(6, {st_mode=S_IFREG|0644, st_size=22808, ...}) = 0
mmap(NULL, 2117912, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 6, 0) = 0x7f0cd5987000
fadvise64(6, 0, 2117912, POSIX_FADV_WILLNEED) = 0
mprotect(0x7f0cd598b000, 2093056, PROT_NONE) = 0
mmap(0x7f0cd5b8a000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 6, 0x3000) = 0x7f0cd5b8a000
close(6) = 0
mprotect(0x7f0cd5b8a000, 4096, PROT_READ) = 0
close(5) = 0
close(4) = 0
getrlimit(RLIMIT_NOFILE, {rlim_cur=1024, rlim_max=1024}) = 0
close(3) = 0
pipe([3, 4]) = 0
pipe([5, 6]) = 0
fcntl(6, F_GETFD) = 0
fcntl(6, F_SETFD, FD_CLOEXEC) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0cd793e9d0) = 27688
close(6) = 0
close(4) = 0
mmap(NULL, 1052672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f0cd5886000
read(5, "", 1048576) = 0
mremap(0x7f0cd5886000, 1052672, 4096, MREMAP_MAYMOVE) = 0x7f0cd5886000
close(5) = 0
munmap(0x7f0cd5886000, 4096) = 0
fcntl(3, F_GETFL) = 0 (flags O_RDONLY)
fstat(3, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f0cd794a000
lseek(3, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek)
fstat(3, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
munmap(0x7f0cd794a000, 4096) = 0
wait4(27688,

It hangs right here. The corresponding process is:

root 27688 27659 0 15:34 pts/2 00:00:00 /bin/sh /sbin/service drbd start
root 27690 27688 0 15:34 pts/2 00:00:00 /bin/bash /etc/init.d/drbd start

Here is the output on another node on which this same code works:

<snip>
fstat(5, {st_mode=S_IFREG|0644, st_size=22808, ...}) = 0
open("/usr/lib64/python2.6/lib-dynload/cStringIO.so", O_RDONLY) = 6
read(6, "177円ELF2円1円1円0円0円0円0円0円0円0円0円0円3円0円>0円1円0円0円0円0円34円0円0円0円0円0円0円"..., 832) = 832
fstat(6, {st_mode=S_IFREG|0644, st_size=22808, ...}) = 0
mmap(NULL, 2117912, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 6, 0) = 0x7f57b3833000
fadvise64(6, 0, 2117912, POSIX_FADV_WILLNEED) = 0
mprotect(0x7f57b3837000, 2093056, PROT_NONE) = 0
mmap(0x7f57b3a36000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 6, 0x3000) = 0x7f57b3a36000
close(6) = 0
mprotect(0x7f57b3a36000, 4096, PROT_READ) = 0
close(5) = 0
close(4) = 0
getrlimit(RLIMIT_NOFILE, {rlim_cur=1024, rlim_max=1024}) = 0
close(3) = 0
pipe([3, 4]) = 0
pipe([5, 6]) = 0
fcntl(6, F_GETFD) = 0
fcntl(6, F_SETFD, FD_CLOEXEC) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f57b57ea9d0) = 32277
close(6) = 0
close(4) = 0
mmap(NULL, 1052672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f57b3732000
read(5, "", 1048576) = 0
mremap(0x7f57b3732000, 1052672, 4096, MREMAP_MAYMOVE) = 0x7f57b3732000
close(5) = 0
munmap(0x7f57b3732000, 4096) = 0
fcntl(3, F_GETFL) = 0 (flags O_RDONLY)
fstat(3, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f57b57f6000
lseek(3, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek)
fstat(3, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
munmap(0x7f57b57f6000, 4096) = 0
wait4(32277, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 32277
--- SIGCHLD (Child exited) @ 0 (0) ---
close(3) = 0
rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x7f57b50386a0}, {0x7f57b52d88f8, [], SA_RESTORER, 0x7f57b50386a0}, 8) = 0
exit_group(0) = ?

I can't figure out what's wrong with this node? Why does this command hang on the first node and not on the second one (running exactly the same OS and python versions)?

Any help would be greatly appreciated.

Thank you in advance,

D.

PS. If anyone wants the complete strace output from either node, please let me know. I'm allowed to keep this machine in this state only until Monday :-(

asked Dec 6, 2012 at 20:51
2
  • Here is some extra data about the hang process above (27688): root# l /proc/27688/fd total 0 dr-x------ 2 root root 0 Dec 6 15:38 ./ dr-xr-xr-x 7 root root 0 Dec 6 15:34 ../ lrwx------ 1 root root 64 Dec 6 16:00 0 -> /dev/pts/2 l-wx------ 1 root root 64 Dec 6 16:00 1 -> pipe:[5852827] l-wx------ 1 root root 64 Dec 6 15:38 2 -> pipe:[5852827] lr-x------ 1 root root 64 Dec 6 16:00 255 -> /sbin/service* Commented Dec 6, 2012 at 21:03
  • I have seen the same thing when starting daemons using subprocess, but it does not happen with all of them. My guess is that the process that forks the daemon is not ended properly or does not close its stdout/stderr. Commented Jul 10, 2018 at 11:12

1 Answer 1

1

It's wrong to use subprocess.call(...stdout=PIPE). This will redirect stdout to a pipe, but this pipe is not being read by the current process, so the subprocess will block after outputting more than N bytes of data (with N depending on the OS).

What you want is subprocess.check_output().

answered Dec 6, 2012 at 23:26
Sign up to request clarification or add additional context in comments.

6 Comments

I'm sorry for the confusion, my original code (before I tried an alternative using call), is as follows:\\
Thank you for replying. I'm sorry for the confusion. My original code (before I tried an alternative using call), is as follows (using the same command, cmd, that is "service drbd start"): p = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE, stderr=subprocess.STDOUT) res = p.communicate() This command hangs (I also tried using subprocess.PIPE for stderr with same results). When run directly in a shell, it returns the following: Starting DRBD resources: [ ].
Please note that if I replace the command with ls -l, it works just fine. More interesting is if I use the following command service drbd status or even service drbd stop these also work. What's so special with running 'service drbd start'?
I'm using python 2.6. There is no check_output() function in the subprocess module.
The p.communicate() version looks fine (it's what check_output() does internally too). I don't know what is special about service drbd start, sorry...
|

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.