Message165748
| Author |
eli.bendersky |
| Recipients |
Arfrever, eli.bendersky, jcon, ncoghlan, pitrou, serhiy.storchaka, tshepang |
| Date |
2012年07月18日.09:32:41 |
| SpamBayes Score |
-1.0 |
| Marked as misclassified |
Yes |
| Message-id |
<1342603963.23.0.290949025848.issue15381@psf.upfronthosting.co.za> |
| In-reply-to |
| Content |
I wonder if this is a fair comparison, Serhiy. Strings are unicode underneath, so they have a large overhead per string (more data to copy around). Increasing the length of the strings changes the game because due to PEP 393, the overhead for ASCII-only Unicode strings is constant:
>>> import sys
>>> sys.getsizeof('a')
50
>>> sys.getsizeof(b'a')
34
>>> sys.getsizeof('a' * 1000)
1049
>>> sys.getsizeof(b'a' * 1000)
1033
>>>
When re-running your tests with larger chunks, the results are quite interesting:
$ ./python -m timeit -s "import io; d=[b'a'*100,b'bb'*50,b'ccc'*50]*1000" "b=io.BytesIO(); w=b.write" "for x in d: w(x)" "b.getvalue()"
1000 loops, best of 3: 509 usec per loop
$ ./python -m timeit -s "import io; d=['a'*100,'bb'*50,'ccc'*50]*1000" "s=io.StringIO(); w=s.write" "for x in d: w(x)" "s.getvalue()"
1000 loops, best of 3: 282 usec per loop
So, it seems to me that BytesIO could use some optimization! |
|