Wrapping bash scripts in python

Question 1

I just found this great wget wrapper and I'd like to rewrite it as a python script using the subprocess module. However it turns out to be quite tricky giving me all sorts of errors.

download()
{
 local url=1ドル
 echo -n " "
 wget --progress=dot $url 2>&1 | grep --line-buffered "%" | \
 sed -u -e "s,\.,,g" | awk '{printf("\b\b\b\b%4s", 2ドル)}'
 echo -ne "\b\b\b\b"
 echo " DONE"
}

Then it can be called like this:

file="patch-2.6.37.gz"
echo -n "Downloading $file:"
download "http://www.kernel.org/pub/linux/kernel/v2.6/$file"

Any ideas?

Source: http://fitnr.com/showing-file-download-progress-using-wget.html

Question 2

You'll need to how us what you have tried in Python so that we'll be able to help you.

Question 3

Basically nothing yet..! I am currently lost in the subprocess documentation..! The ideal thing to do here would be an insightful explanation of a proposed solution so that I can properly grasp the concept of the subprocess module and expand on it.

Question 4

Allright, so far I did this:

wgetExecutable = '/usr/bin/wget' grepExecutable = '/usr/grep' wgetParameters = ['--progress=dot', "link_to_file"] grepParameters = ['--line-buffered', "%"] wgetPopen = subprocess.Popen([wgetExecutable] + wgetParameters, stdout=subprocess.PIPE)

Question 5

grepPopen = subprocess.Popen([grepExecutable] + grepParameters, stdin=wgetPopen.stdout) however I get an error in stdin=wgetPopen.stdout OSError: [Errno 2] No such file or directory

Question 6

Note that there is also an sh module (with that name) that can take care of the bridge between bash and python!

Question 7

I think you're not far off. Mainly I'm wondering, why bother with running pipes into grep and sed and awk when you can do all that internally in Python?

#! /usr/bin/env python
import re
import subprocess
TARGET_FILE = "linux-2.6.0.tar.xz"
TARGET_LINK = "http://www.kernel.org/pub/linux/kernel/v2.6/%s" % TARGET_FILE
wgetExecutable = '/usr/bin/wget'
wgetParameters = ['--progress=dot', TARGET_LINK]
wgetPopen = subprocess.Popen([wgetExecutable] + wgetParameters,
 stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
for line in iter(wgetPopen.stdout.readline, b''):
 match = re.search(r'\d+%', line)
 if match:
 print '\b\b\b\b' + match.group(0),
wgetPopen.stdout.close()
wgetPopen.wait()

Question 8

It does. Try on a smaller file. Or wait a little longer. :-)

Question 9

Your code seems to update on some sort of intervals and in this file for example the first progress indication is only after 25%. However I need the progress to be instantaneous from the start just like the bash script..!

Question 10

On my machine the behavior of this script is identical to the behavior of the bash script you posted. They both produce line-buffered output at the same rate. I'd be happy to adjust the script to do something different but I'm not able to reproduce the behavior you're talking about. I suspect that you're just seeing different response times for different files.

Question 11

Ah: I get results closer to what you describe if I use awk -W interactive in the bash script. I'll poke at this some more later and see if I need to do something special to force line-buffered output in subprocess.

Question 12

+1. wgetPopen.stdout might be destroyed (I expect so, but I don't know). As well as with ordinary files, it is better to close them explicitly (with-statement is used for the files) without relying on garbage collection (that is complex and hard to reason about). if not obj says "if obj empty or zero" (the test for None should be written as if obj is None) without concerning with types e.g., in Python 3 pipe.readline() may return b'' or '' that are different types and if not line works for both. And It supports both Python 2/3 from the same source.

Question 13

If you are rewriting the script in Python; you could replace wget by urllib.urlretrieve() in this case:

#!/usr/bin/env python
import os
import posixpath
import sys
import urllib
import urlparse
def url2filename(url):
 """Return basename corresponding to url.
 >>> url2filename('http://example.com/path/to/file?opt=1')
 'file'
 """
 urlpath = urlparse.urlsplit(url).path # pylint: disable=E1103
 basename = posixpath.basename(urllib.unquote(urlpath))
 if os.path.basename(basename) != basename:
 raise ValueError # refuse 'dir%5Cbasename.ext' on Windows
 return basename
def reporthook(blocknum, blocksize, totalsize):
 """Report download progress on stderr."""
 readsofar = blocknum * blocksize
 if totalsize > 0:
 percent = readsofar * 1e2 / totalsize
 s = "\r%5.1f%% %*d / %d" % (
 percent, len(str(totalsize)), readsofar, totalsize)
 sys.stderr.write(s)
 if readsofar >= totalsize: # near the end
 sys.stderr.write("\n")
 else: # total size is unknown
 sys.stderr.write("read %d\n" % (readsofar,))
url = sys.argv[1]
filename = sys.argv[2] if len(sys.argv) > 2 else url2filename(url)
urllib.urlretrieve(url, filename, reporthook)

Example:

$ python download-file.py http://example.com/path/to/file

It downloads the url to a file. If the file is not given then it uses basename from the url.

You could also run wget if you need it:

#!/usr/bin/env python
import sys
from subprocess import Popen, PIPE, STDOUT
def urlretrieve(url, filename=None, width=4):
 destination = ["-O", filename] if filename is not None else []
 p = Popen(["wget"] + destination + ["--progress=dot", url],
 stdout=PIPE, stderr=STDOUT, bufsize=1) # line-buffered (out side)
 for line in iter(p.stdout.readline, b''):
 if b'%' in line: # grep "%"
 line = line.replace(b'.', b'') # sed -u -e "s,\.,,g"
 percents = line.split(None, 2)[1].decode() # awk 2ドル
 sys.stderr.write("\b"*width + percents.rjust(width))
 p.communicate() # close stdout, wait for child's exit
 print("\b"*width + "DONE")
url = sys.argv[1]
filename = sys.argv[2] if len(sys.argv) > 2 else None
urlretrieve(url, filename)

I have not noticed any buffering issues with this code.

Question 14

I've done something like this before. and i'd love to share my code with you:)

#!/usr/bin/python2.7
# encoding=utf-8
import sys
import os
import datetime
SHEBANG = "#!/bin/bash\n\n"
def get_cmd(editor='vim', initial_cmd=""):
 from subprocess import call
 from tempfile import NamedTemporaryFile
 # Create the initial temporary file.
 with NamedTemporaryFile(delete=False) as tf:
 tfName = tf.name
 tf.write(initial_cmd)
 # Fire up the editor.
 if call([editor, tfName], shell=False) != 0:
 return None
 # Editor died or was killed.
 # Get the modified content.
 fd = open(tfName)
 res = fd.read()
 fd.close()
 os.remove(tfName)
 return res
def main():
 initial_cmd = "wget " + sys.argv[1]
 cmd = get_cmd(editor='vim', initial_cmd=initial_cmd)
 if len(sys.argv) > 1 and sys.argv[1] == 's':
 #keep the download infomation.
 t = datetime.datetime.now()
 filename = "swget_%02d%02d%02d%02d%02d" %\
 (t.month, t.day, t.hour, t.minute, t.second)
 with open(filename, 'w') as f:
 f.write(SHEBANG)
 f.write(cmd)
 f.close()
 os.chmod(filename, 0777)
 os.system(cmd)
main()
# run this script with the optional argument 's'
# copy the command to the editor, then save and quit. it will 
# begin to download. if you have use the argument 's'.
# then this script will create another executable script, you 
# can use that script to resume you interrupt download.( if server support)

so, basically, you just need to modify the initial_cmd's value, in your case, it's

wget --progress=dot $url 2>&1 | grep --line-buffered "%" | \
 sed -u -e "s,\.,,g" | awk '{printf("\b\b\b\b%4s", 2ドル)}'

this script will first create a temp file, then put shell commands in it, and give it execute permissions. and finally run the temp file with commands in it.

Question 15

i'd love to give you some feedback :) You could call(filename) instead of os.system(cmd). To format datetime, you could use .strftime() method. with-statement closes files automatically that is the point of using it in the first place, no need to call f.close() by hand (unindent chmod in this case). If you want to make script executable by your user: os.chmod(filename, os.stat(filename).st_mode | stat.S_IEXEC) (or | 0111 for +x). To avoid leaking files, move code inside with Named..File() as tf: call tf.flush() before call([editor..) then tf.seek(0); res=tf.read()

Question 16

@J.F.Sebastian wow, thank you, man! it's a script i wrote long time ago. I was a bad python programmer back then:) thank you for pointing that out!

Question 17

vim download.py

#!/usr/bin/env python
import subprocess
import os
sh_cmd = r"""
download()
{
 local url=1ドル
 echo -n " "
 wget --progress=dot $url 2>&1 |
 grep --line-buffered "%" |
 sed -u -e "s,\.,,g" |
 awk '{printf("\b\b\b\b%4s", 2ドル)}'
 echo -ne "\b\b\b\b"
 echo " DONE"
}
download "http://www.kernel.org/pub/linux/kernel/v2.6/$file"
"""
cmd = 'sh'
p = subprocess.Popen(cmd, 
 shell=True,
 stdin=subprocess.PIPE,
 env=os.environ
)
p.communicate(input=sh_cmd)
# or:
# p = subprocess.Popen(cmd,
# shell=True,
# stdin=subprocess.PIPE,
# env={'file':'xx'})
# 
# p.communicate(input=sh_cmd)
# or:
# p = subprocess.Popen(cmd, shell=True,
# stdin=subprocess.PIPE,
# stdout=subprocess.PIPE,
# stderr=subprocess.PIPE,
# env=os.environ)
# stdout, stderr = p.communicate(input=sh_cmd)

then you can call like:

file="xxx" python dowload.py

Question 18

Why use sh as the command, and use shell=True? Why not run sh_cmd directly?

Question 19

@MartijnPieters Because the sh_cmd is not a "shell command", so we use sh to run it. In linux shell, we can use sh script.sh , and we can also use a PIPE or stdin to run some command, such as:cat some_file | sh or curl http://xxx.xx | sh and so on. For shell=Ture, From the docs, is says:The shell argument (which defaults to False) specifies whether to use the shell as the program to execute. If shell is True, it is recommended to pass args as a string rather than as a sequence.

Question 20

If you set shell=True a shell is used to run the command you pass in. You quoted the documentation yourself there.

Question 21

In very simple words, considering you have script.sh file, you can execute it and print its return value, if any:

import subprocess
process = subprocess.Popen('/path/to/script.sh', shell=True, stdout=subprocess.PIPE)
process.wait()
print process.returncode

Question 22

And ensure the script.sh has execute permission(chmod +x script.sh) or Popen('sh /path/to/script.sh', shell=True ...)

Question 23

sure, it must have an execute permission +X, otherwise, it will give you an error, then, the above python code should work like a charm!

Tim Pierce 5,7041 gold badge18 silver badges31 bronze badges · Accepted Answer · 2013-12-09 04:53:02Z

5

+100

I think you're not far off. Mainly I'm wondering, why bother with running pipes into grep and sed and awk when you can do all that internally in Python?

#! /usr/bin/env python
import re
import subprocess
TARGET_FILE = "linux-2.6.0.tar.xz"
TARGET_LINK = "http://www.kernel.org/pub/linux/kernel/v2.6/%s" % TARGET_FILE
wgetExecutable = '/usr/bin/wget'
wgetParameters = ['--progress=dot', TARGET_LINK]
wgetPopen = subprocess.Popen([wgetExecutable] + wgetParameters,
 stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
for line in iter(wgetPopen.stdout.readline, b''):
 match = re.search(r'\d+%', line)
 if match:
 print '\b\b\b\b' + match.group(0),
wgetPopen.stdout.close()
wgetPopen.wait()

Share

Improve this answer

edited Oct 18, 2014 at 18:02

dfarrell07's user avatar

dfarrell07

3,0482 gold badges24 silver badges28 bronze badges

answered Dec 9, 2013 at 4:53

Tim Pierce's user avatar

Tim Pierce

5,7041 gold badge18 silver badges31 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

Tim Pierce

Tim Pierce Over a year ago

It does. Try on a smaller file. Or wait a little longer. :-)

2013年12月09日T06:46:55.03Z+00:00

stratis

stratis Over a year ago

Your code seems to update on some sort of intervals and in this file for example the first progress indication is only after 25%. However I need the progress to be instantaneous from the start just like the bash script..!

2013年12月09日T06:54:48.963Z+00:00

Tim Pierce

Tim Pierce Over a year ago

On my machine the behavior of this script is identical to the behavior of the bash script you posted. They both produce line-buffered output at the same rate. I'd be happy to adjust the script to do something different but I'm not able to reproduce the behavior you're talking about. I suspect that you're just seeing different response times for different files.

2013年12月09日T07:06:56.273Z+00:00

Tim Pierce

Tim Pierce Over a year ago

Ah: I get results closer to what you describe if I use awk -W interactive in the bash script. I'll poke at this some more later and see if I need to do something special to force line-buffered output in subprocess.

2013年12月09日T07:10:28.847Z+00:00

jfs

jfs Over a year ago

+1. wgetPopen.stdout might be destroyed (I expect so, but I don't know). As well as with ordinary files, it is better to close them explicitly (with-statement is used for the files) without relying on garbage collection (that is complex and hard to reason about). if not obj says "if obj empty or zero" (the test for None should be written as if obj is None) without concerning with types e.g., in Python 3 pipe.readline() may return b'' or '' that are different types and if not line works for both. And It supports both Python 2/3 from the same source.

2013年12月09日T19:05:01.73Z+00:00

|

CollectivesTM on Stack Overflow

Wrapping bash scripts in python

5 Answers 5

10 Comments

Comments

2 Comments

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

5 Answers 5

10 Comments

Comments

2 Comments

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related