Message189616
| Author |
nadeem.vawda |
| Recipients |
Michael.Fox, nadeem.vawda, pitrou, rhettinger, serhiy.storchaka, vstinner |
| Date |
2013年05月19日.18:50:56 |
| SpamBayes Score |
-1.0 |
| Marked as misclassified |
Yes |
| Message-id |
<1368989456.71.0.0894411600818.issue18003@psf.upfronthosting.co.za> |
| In-reply-to |
| Content |
> I agree that making lzma.open() wrap its return value in a BufferedReader
> (or BufferedWriter, as appropriate) is the way to go.
On second thoughts, there's no need to change the behavior for mode='wb'.
We can just return a BufferedReader for mode='rb', and leave the current
behavior (returning a raw LZMAFile) in place for mode='wb'.
I also ran some additional benchmarks for the bz2 and gzip modules. It
looks like those two modules would also benefit from having their open()
functions use io.BufferedReader:
[lzma]
$ time xzcat src.xz | wc -l
1057980
real 0m0.543s
user 0m0.556s
sys 0m0.024s
$ ../cpython/python -m timeit -s 'import lzma, io' 'f = lzma.open("src.xz", "r")' 'for line in f: pass'
10 loops, best of 3: 2.01 sec per loop
$ ../cpython/python -m timeit -s 'import lzma, io' 'f = io.BufferedReader(lzma.open("src.xz", "r"))' 'for line in f: pass'
10 loops, best of 3: 795 msec per loop
[bz2]
$ time bzcat src.bz2 | wc -l
1057980
real 0m1.322s
user 0m1.324s
sys 0m0.044s
$ ../cpython/python -m timeit -s 'import bz2, io' 'f = bz2.open("src.bz2", "r")' 'for line in f: pass'
10 loops, best of 3: 3.71 sec per loop
$ ../cpython/python -m timeit -s 'import bz2, io' 'f = io.BufferedReader(bz2.open("src.bz2", "r"))' 'for line in f: pass'
10 loops, best of 3: 2.04 sec per loop
[gzip]
$ time zcat src.gz | wc -l
1057980
real 0m0.310s
user 0m0.296s
sys 0m0.028s
$ ../cpython/python -m timeit -s 'import gzip, io' 'f = gzip.open("src.gz", "r")' 'for line in f: pass'
10 loops, best of 3: 1.94 sec per loop
$ ../cpython/python -m timeit -s 'import gzip, io' 'f = io.BufferedReader(gzip.open("src.gz", "r"))' 'for line in f: pass'
10 loops, best of 3: 556 msec per loop |
|