1

I want to use a bash command using python's subprocess.Popen. My bash command looks like:

$ gunzip -c /my/dir/file1.gz /my/dir/file2.gz | gsplit -l 500000 --numerical-suffixes=1 --suffix-length=3 --additional-suffix=.split - /my/dir/output/file_

It takes compressed files, uncompresses them, merges the content, splits the content into output files. I can do that in Python this way:

from __future__ import print_function
import subprocess
dir = "/my/dir"
files = ["file1.gz", "file2.gz"]
 
cmd1 = "gunzip -c {}".format(" ".join([dir+files[0], dir+files[1]]))
cmd2 = "{} -l {} --numeric-suffixes={} --suffix-length={} --additional-suffix={} - {}"\
 .format("gsplit", 500000, 1, 3, ".split"#, "'gzip > $FILE.gz'"
 , "/my/dir/output/file_")
 
proc1 = subprocess.Popen(str(cmd1).split(), stdout=subprocess.PIPE)
proc2 = subprocess.Popen(str(cmd2).split(), stdin=proc1.stdout, stdout=subprocess.PIPE)
proc1.stdout.close()
proc2.wait()
print("result:", proc2.returncode)

Then I can check the output:

$ ls /my/dir/output
file_001.split
file_002.split
file_003.split

Now I want to make use of the gsplit's --filter argument, which allows to pipe the result to another command. Here, I chose gzip as I want to compress the output. Bash command looks like this:

$ gunzip -c /my/dir/file1.gz /my/dir/file2.gz | gsplit -l 500000 --numerical-suffixes=1 --suffix-length=3 --additional-suffix=.split --filter='gzip > $FILE.gz' - /my/dir/output/file_

This command works.

Now putting it into python code:

from __future__ import print_function
import subprocess
dir = "/my/dir"
files = ["file1.gz", "file2.gz"]
 
cmd1 = "gunzip -c {}".format(" ".join([dir+files[0], dir+files[1]]))
cmd2 = "{} -l {} --numeric-suffixes={} --suffix-length={} --additional-suffix={} --filter={} - {}"\
 .format("gsplit", 500000, 1, 3, ".split", "'gzip > $FILE.gz'"
 , "/my/dir/output/file_")
 
proc1 = subprocess.Popen(str(cmd1).split(), stdout=subprocess.PIPE)
proc2 = subprocess.Popen(str(cmd2).split(), stdin=proc1.stdout, stdout=subprocess.PIPE)
proc1.stdout.close()
proc2.wait()
print("result:", proc2.returncode)

Alas I get this error:

/usr/local/bin/gsplit: invalid option -- 'f'

Try '/usr/local/bin/gsplit --help' for more information.

gunzip: error writing to output: Broken pipe

gunzip: /my/dir/file1.gz: uncompress failed

gunzip: error writing to output: Broken pipe

gunzip: /my/dir/file12.gz: uncompress failed

I think it has to do with the redirection symbol in gzip > $FILE.gz.

What is going on, how can I resolve this issue?

asked May 24, 2016 at 15:38
5
  • If you hardcode the command as a list (instead of doing format and then split) does it work? I think split splits your filter apart into ["--filter='gzip", ">", "$FILE.gz'"] which is not what you want. Commented May 24, 2016 at 15:45
  • 2
    see docs.python.org/2/library/…. alternately, you may want to try using shell=True, which should actually start bash instead of the exact commands, and let you put the entire line (with pipes, redirects, and all - which are shell functions anyways, not part of the called executables) as is.... Commented May 24, 2016 at 15:46
  • Aside: why do you call str(cmd2).split() rather than cmd2.split()? Since cmd2 is already a str what does the extra function call get? Commented May 24, 2016 at 15:51
  • @Robφ: I actually used this chunk in a function, I used the str() to make sure the argument was a string. But here in my example it is totally irrelevant you are right Commented May 24, 2016 at 16:08
  • @Corley Brigman: I changed from the subprocess.call() to subprocess.Popen() to track the exit status when piping Commented May 24, 2016 at 16:09

1 Answer 1

3

str.split() isn't the appropriate function to convert a command-line string into an array of arguments. To see why, try:

print(str(cmd2).split())

Notice that "'gzip >, and $FILE.gz'" are in distinct arguments.

Try:

#UNTESTED
proc2 = subprocess.Popen(shlex.split(cmd2), stdin=proc1.stdout, stdout=subprocess.PIPE)
answered May 24, 2016 at 15:48
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks Rob, it did the trick! I was totally careless on that split.. I just brainlessly added the filter argument without questioning the behaviour resulting from the split. My bad
@kaligne: 1- shlex.split() can be fooled 2- I don't see how it helps with $FILE.
@J.F. Sebastian: well i used a simple str().split on the command which messed the string "gzip > $FILE.gz" up in that sense that I did not intend to split it based on spaces (and that's my bad). So shlex.split() DID help. However could you develop why and how it could be fooled? What is the alternative?
Sorry I read your reply on my phone, the link was not coloured then... Thanks anyway

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.