Message 206779 - Python tracker

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

In-reply-to
Author	belopolsky
Recipients	belopolsky
Date	2013年12月21日.20:58:58
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1387659539.36.0.629484655922.issue20048@psf.upfronthosting.co.za>

Content
This problem happens when I unpack a file from a 200+ MB zip archive as follows: with zipfile.ZipFile(archive) as z: data = b'' with z.open(filename, 'rU') as f: for line in f: data += line I cannot reduce it to a test case suitable for posting here, but the culprit is the following code in zipfile.py: def peek(self, n=1): """Returns buffered bytes without advancing the position.""" if n > len(self._readbuffer) - self._offset: chunk = self.read(n) self._offset -= len(chunk) See http://hg.python.org/cpython/file/81f8375e60ce/Lib/zipfile.py#l605 The problem occurs when peek() is called on the boundary of the uncompress buffer and read() goes through more than one readbuffer. The result is that self._offset is smaller than len(chunk) leading to a non-sensical negative self._offset upon return from peek(). This problem does not seem to appear in 3.x since 028e8e0b03e8.

Content

This problem happens when I unpack a file from a 200+ MB zip archive as follows:
with zipfile.ZipFile(archive) as z:
 data = b''
 with z.open(filename, 'rU') as f:
 for line in f:
 	 data += line
I cannot reduce it to a test case suitable for posting here, but the culprit is the following code in zipfile.py:
 def peek(self, n=1):
 """Returns buffered bytes without advancing the position."""
 if n > len(self._readbuffer) - self._offset:
 chunk = self.read(n)
 self._offset -= len(chunk)
See http://hg.python.org/cpython/file/81f8375e60ce/Lib/zipfile.py#l605
The problem occurs when peek() is called on the boundary of the uncompress buffer and read() goes through more than one readbuffer. The result is that self._offset is smaller than len(chunk) leading to a non-sensical negative self._offset upon return from peek().
This problem does not seem to appear in 3.x since 028e8e0b03e8.

History
Date	User	Action	Args
2013年12月21日 20:58:59	belopolsky	set	recipients: + belopolsky
2013年12月21日 20:58:59	belopolsky	set	messageid: <1387659539.36.0.629484655922.issue20048@psf.upfronthosting.co.za>
2013年12月21日 20:58:59	belopolsky	link	issue20048 messages
2013年12月21日 20:58:58	belopolsky	create

homepage