[Python-Dev] file.readinto performance regression in Python 3.2 vs. 2.7?

Fri Nov 25 07:13:45 CET 2011

On Fri, Nov 25, 2011 at 12:07 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> On 2011年11月25日 12:02:17 +1100
> Matt Joiner <anacrolix at gmail.com> wrote:
>> It's my impression that the readinto method does not fully support the
>> buffer interface I was expecting. I've never had cause to use it until
>> now. I've created a question on SO that describes my confusion:
>>>> http://stackoverflow.com/q/8263899/149482
>> Just use a memoryview and slice it:
>> b = bytearray(...)
> m = memoryview(b)
> n = f.readinto(m[some_offset:])

Cheers, this seems to be what I wanted. Unfortunately it doesn't
perform noticeably better if I do this.
Eli, the use pattern I was referring to is when you read in chunks,
and and append to a running buffer. Presumably if you know in advance
the size of the data, you can readinto directly to a region of a
bytearray. There by avoiding having to allocate a temporary buffer for
the read, and creating a new buffer containing the running buffer,
plus the new.
Strangely, I find that your readandcopy is faster at this, but not by
much, than readinto. Here's the code, it's a bit explicit, but then so
was the original:
BUFSIZE = 0x10000
def justread():
 # Just read a file's contents into a string/bytes object
 f = open(FILENAME, 'rb')
 s = b''
 while True:
 b = f.read(BUFSIZE)
 if not b:
 break
 s += b
def readandcopy():
 # Read a file's contents and copy them into a bytearray.
 # An extra copy is done here.
 f = open(FILENAME, 'rb')
 s = bytearray()
 while True:
 b = f.read(BUFSIZE)
 if not b:
 break
 s += b
def readinto():
 # Read a file's contents directly into a bytearray,
 # hopefully employing its buffer interface
 f = open(FILENAME, 'rb')
 s = bytearray(os.path.getsize(FILENAME))
 o = 0
 while True:
 b = f.readinto(memoryview(s)[o:o+BUFSIZE])
 if not b:
 break
 o += b
And the timings:
$ python3 -O -m timeit 'import fileread_bytearray'
'fileread_bytearray.justread()'
10 loops, best of 3: 298 msec per loop
$ python3 -O -m timeit 'import fileread_bytearray'
'fileread_bytearray.readandcopy()'
100 loops, best of 3: 9.22 msec per loop
$ python3 -O -m timeit 'import fileread_bytearray'
'fileread_bytearray.readinto()'
100 loops, best of 3: 9.31 msec per loop
The file was 10MB. I expected readinto to perform much better than
readandcopy. I expected readandcopy to perform slightly better than
justread. This clearly isn't the case.
>>> Also I saw some comments on "top-posting" am I guilty of this?

If tehre's a magical option in gmail someone knows about, please tell.
>> Kind of :)
>> Regards
>> Antoine.
>>> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/anacrolix%40gmail.com
>