This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2014年11月10日 23:58 by dw, last changed 2022年04月11日 14:58 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| mymy.zip | dw, 2014年11月10日 23:58 | test case | ||
| Messages (3) | |||
|---|---|---|---|
| msg230987 - (view) | Author: David Wilson (dw) * | Date: 2014年11月10日 23:58 | |
There is some really funky behaviour in the zipfile module, where, depending on whether zipfile.ZipFile() is passed a string filename or a file-like object, one of two things happens: a) Given a file-like object, zipfile does not (since it cannot) consume excess file descriptors on each call to '.open()', however simultaneous calls to .open() the zip file's members (from the same thread) will produce file-like objects for each member that appear intertwingled in some unfortunate manner: Traceback (most recent call last): File "my.py", line 23, in <module> b() File "my.py", line 18, in b m.readline() File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/zipfile.py", line 689, in readline return io.BufferedIOBase.readline(self, limit) File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/zipfile.py", line 727, in peek chunk = self.read(n) File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/zipfile.py", line 763, in read data = self._read1(n) File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/zipfile.py", line 839, in _read1 data = self._decompressor.decompress(data, n) zlib.error: Error -3 while decompressing data: invalid stored block lengths b) Given a string filename, simultaneous use of .open() produces a new file descriptor for each opened member, which does not result in the above error, but triggers an even worse one: file descriptor exhaustion given a sufficiently large zip file. This tripped me up rather badly last week during consulting work, and I'd like to see both these behaviours fixed somehow. The ticket is more an RFC to see if anyone has thoughts on how this fix should happen; it seems to me a no-brainer that, since the ZIP file format fundamentally always requires a seekable file, that in both the "constructed using file-like object" case, and the "constructed using filename" case, we should somehow reuse the sole file object passed to us to satisfy all reads of compressed member data. It seems the problems can be fixed in both cases without damaging interface semantics by simply tracking the expected 'current' read offset in each ZipExtFile instance. Prior to any read, we simply call .seek() on the file object prior to performing any .read(). Of course the result would not be thread safe, but at least in the current code, ZipExtFile for a "constructed from a file-like object" edition zipfile is already not thread-safe. With some additional work, we could make the module thread-safe in both cases, however this is not the current semantic and doesn't appear to be guaranteed by the module documentation. --- Finally as to why you'd want to simultaneously open huge numbers of ZIP members, well, ZIP itself easily supports streamy reads, and ZIP files can be quite large, even larger than RAM. So it should be possible, as I needed last week, to read streamily from a large number of members. --- The attached my.zip is sufficient to demonstrate both problems. The attached my.py has function a() to demonstrate the FD leak and b() to demonstrate the interwingly state. |
|||
| msg230990 - (view) | Author: David Wilson (dw) * | Date: 2014年11月11日 00:04 | |
As a random side-note, this is another case where I really wish Python had a .pread() function. It's uniquely valuable for coordinating positioned reads in a threaded app without synchronization (at user level anyway) or extraneous system calls. |
|||
| msg230995 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2014年11月11日 01:37 | |
This is a duplicate of issue 16569 and issue 14099. Since the former links to the latter I'm using that as the superseder. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:58:10 | admin | set | github: 67031 |
| 2014年11月11日 01:37:58 | r.david.murray | set | status: open -> closed superseder: Preventing errors of simultaneous access in zipfile nosy: + r.david.murray messages: + msg230995 resolution: duplicate stage: resolved |
| 2014年11月11日 00:04:43 | dw | set | messages: + msg230990 |
| 2014年11月10日 23:58:48 | dw | create | |