I am using Python 2.6.5 and am trying to find the fastest way to print out the contents of a .gz file. It's my understanding that prior to v2.5, zcat was much faster than gzip (see here)....I guess that has changed (at least according to a comment in that post)? I have unzipped a 2.4MB .gz file 3 ways and they all seem to take about 17 minutes. Is there a faster way?
This takes 17 minutes in Python:
d = zlib.decompressobj(16+zlib.MAX_WBITS)
f = open('/2.4MB.gz','rb')
buffer = f.read(1024)
while buffer:
outstr = d.decompress(buffer)
print(outstr)
buffer = f.read(1024)
outstr = d.flush()
print(outstr)
f.close()
This also takes 17 minutes:
f = gzip.open('/2.4MB.gz', 'rb')
file_content = f.read()
print file_content
f.close()
Again, 17 minutes:
def gziplines(fname):
from subprocess import Popen, PIPE
f = Popen(['zcat',fname],stdout = PIPE)
for line in f.stdout:
yield line
fname = '/2.4MB.gz'
for line in gziplines(fname):
print line,
My eventual goal is to take the contents of the .gz file and dump them directly into a MySQL database without printing the lines. The unzipped file is 15.8MB.
When I gunzip the file and then use the CSV module to print out the contents to the screen, it takes 1 minute vs. the 17 minutes before. Printing in and of itself doesn't seem to be the problem.
1 Answer 1
As it stands, you are measuring the speed of printing. Printing is one of the slowest things your program will ever have to do. Take the prints out and remeasure to find out the speed.
Your middle method will almost certainly be the fastest.
print line
is probably taking up 99% of your execution time. \$\endgroup\$