This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2007年07月19日 18:56 by arkanes, last changed 2022年04月11日 14:56 by admin. This issue is now closed.
| Messages (20) | |||
|---|---|---|---|
| msg32532 - (view) | Author: Chris Mellon (arkanes) | Date: 2007年07月19日 18:56 | |
One of the ways that zipfile.ZipFile checks for a valid zipfile is by confirming that the specified comment length matches the found comment. I have encountered zipfiles in the wild that indicate a comment length of 0, but have filled the comment area with an arbitrary amount of nulls. In the interests of robustness, I think that "comment = comment.rstrip('\x00') should be added to the zipfile check.
|
|||
| msg55174 - (view) | Author: Georg Brandl (georg.brandl) * (Python committer) | Date: 2007年08月23日 18:16 | |
Alan? |
|||
| msg62003 - (view) | Author: Alan McIntyre (alanmcintyre) * (Python committer) | Date: 2008年02月02日 15:35 | |
It would seem that such a zip file is not consistent with the spec (http://www.pkware.com/documents/casestudies/APPNOTE.TXT). My first reaction is that we shouldn't accept behavior outside the spec unless it's something that's done by many popular ZIP applications. However, if somebody can convince me that this is a deviation from the spec that we should support, I'll write a patch for it. :) |
|||
| msg62041 - (view) | Author: Chris Mellon (arkanes) | Date: 2008年02月04日 15:17 | |
I agree that the zipfile is out of spec. Here are my arguments in favor of making the change anyway: Existing zip tools like 7zip, pkzip, and winzip handle these files "as expected" As far as I know, it won't break any valid zipfiles. Because the fix necessary is buried inside a private function in the zipfile module, it's difficult to monkey-patch the behavior into place and it's quite hard to figure out what needs to be done. |
|||
| msg73317 - (view) | Author: Mark Hirota (markhirota) | Date: 2008年09月16日 21:53 | |
I'd like to piggyback on this issue if okay :D
I have some zipfiles I'm working with that contain junk in the extra
fields. The ZipFile object croaks at the call to the
ZipInfo._decodeExtra() call when it could really just ignore the error.
Example of traceback:
>>> zf = zipfile.ZipFile('bad.zip', 'r')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "zipfile.py", line 346, in __init__
self._GetContents()
File "zipfile.py", line 366, in _GetContents
self._RealGetContents()
File "zipfile.py", line 424, in _RealGetContents
x._decodeExtra()
File "zipfile.py", line 267, in _decodeExtra
tp, ln = unpack('<hh', extra[:4])
File "/usr/lib/python2.5/struct.py", line 87, in unpack
return o.unpack(s)
struct.error: unpack requires a string argument of length 4
While I'm working to track down the source of the "extra junk" -- this
type of error falls into the same category: Python zipfile hits an
issue that most zip tools are fine with.
|
|||
| msg115928 - (view) | Author: Alan McIntyre (alanmcintyre) * (Python committer) | Date: 2010年09月09日 05:18 | |
Maybe ZipFile should have an option to handle archives in a non-strict mode, in which it would raise warnings or just completely ignore a small set of minor violations of the spec. That way people that want behavior that's in compliance with the spec will have it (by default), but those that need to handle ZIP files that bend the rules can still choose to use the Python standard libraries to get their work done. If there's any interest in this I can work up a patch, tests, and documentation. |
|||
| msg115969 - (view) | Author: Mark Hirota (markhirota) | Date: 2010年09月09日 20:27 | |
Yes, I'd be interested in this functionality. Anything that makes it more flexible to use with a wide variety of zipfiles (even bad ones :D) allows for increased reusability. Thanks! |
|||
| msg138027 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2011年06月09日 20:09 | |
Note that based on the fact that most zipfile tools handle garbage after the indicated end of the comment by ignoring it, #10694 fixed zipfile to also ignore such trailing data. It sounds like there may be more out-of-spec errors that could be ignored, though. Should this issue therefore be left open? If so, does anyone want to propose a patch? |
|||
| msg143973 - (view) | Author: Alan McIntyre (alanmcintyre) * (Python committer) | Date: 2011年09月13日 15:00 | |
So far I haven't had the opportunity to sit down and write up a "lenient zipfile handling" patch; my apologies to those that could really use one. If somebody does propose a patch, I'll be glad to test and review it. I suppose I would like to see the issue kept open for a while, even if just to collect common "bending of the rules" cases that people would like to see supported. |
|||
| msg145769 - (view) | Author: antitree (antitree) | Date: 2011年10月18日 02:11 | |
I'm still affected by this issue. A workaround or patch would be appreciated. |
|||
| msg145884 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2011年10月19日 01:19 | |
"This issue" is currently a collection point for specific "allow this in lax mode" issues. Do you have one or more specific cases in point that you'd like to talk about? (Note that the 'garbage after end of file' bug has already been fixed.) |
|||
| msg149042 - (view) | Author: James C. Ahlstrom (ahlstromjc) | Date: 2011年12月08日 17:08 | |
I received a bug report from a user. He had a zip file created by Mac OS 10.5.8 that the zipfile module claimed was not a valid zip file. The traceback went to function _EndRecData(fpin). The file had a valid comment appended, but recorded a comment length of zero. I am posting to this thread because it seems related. The zero comment length is incorrect, but the file is read without complaint by other software. Note that this is not end of file garbage; the comment is valid. The _EndRecData(fpin) function is reading the "End of Central Directory" record. The endrec[7] is the comment length, but of more interest is endrec[6] which is the start of the Central Directory. This is a file offset, and if valid it points to a Central Directory record. These records start with a known string PK001円002円. I propose that the correct fix is to delete the test for correct comment length, and replace it with a test that reads four bytes at offset endrec[6] and makes sure it is PK001円002円 and a valid record. This is viewed not as a hack for defective software, but rather as an improved sanity check, since finding the Central Directory is vital to reading a zip file. This code replaces the end of _EndRecData(fpin), and was taken from Python2.5 (so check against the relevant version): if start >= 0: # Correct signature string was found endrec = struct.unpack(structEndArchive, data[start:start+22]) endrec = list(endrec) comment = data[start+22:] ## Relevant changes here fpin.seek(endrec[6], 0) # Seek to the start of the central directory dat = fpin.read(4) # Read four bytes # Note: Mac OS is known to add a comment, but record the length as zero. if dat == stringCentralDir: # Success ## # Append the archive comment and start offset endrec.append(comment) endrec.append(filesize - END_BLOCK + start) if endrec[-4] == -1 or endrec[-4] == 0xffffffff: return _EndRecData64(fpin, - END_BLOCK + start, endrec) return endrec return # Error, return None |
|||
| msg149044 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2011年12月08日 17:14 | |
I'm pretty sure that what you are reporting has been fixed by the 'garbage' fix I mentioned (which might better be called the 'more flexible handling of comments' patch). Though not in 2.5 (or 2.6) since they are no longer in maintenance. If you could confirm this by testing against 2.7 or 3.2, that would be great. |
|||
| msg149103 - (view) | Author: James C. Ahlstrom (ahlstromjc) | Date: 2011年12月09日 14:58 | |
Problem was reported on 2.7. I will check in detail this weekend. Please stand by. |
|||
| msg149109 - (view) | Author: James C. Ahlstrom (ahlstromjc) | Date: 2011年12月09日 17:11 | |
I grabbed a 2.7.2 zipfile.py, and my original comments stand. If there is a "garbage at end of file" patch, I can't find it; please provide a line number or a hint. The user at yale.edu reports that the patch works. Here is a diff of my changes. To test, append some junk to a good zipfile: echo junk >> good.zip, and try reading it. Let me know if you want me to do anything else; maybe look at 3.2; or email me offline. *** zipfile.py 2011年12月09日 11:25:07.000000000 -0500 --- ../zipfile.py 2011年12月09日 05:48:00.000000000 -0500 *************** *** 237,248 **** recData = data[start:start+sizeEndCentDir] endrec = list(struct.unpack(structEndArchive, recData)) comment = data[start+sizeEndCentDir:] ! ## Remove # check that comment length is correct ! ## Remove if endrec[_ECD_COMMENT_SIZE] == len(comment): ! # check that the offset to the Central Directory points to a valid item ! fpin.seek(endrec[_ECD_OFFSET], 0) ! dat = fpin.read(4) ! if dat == stringCentralDir: # Append the archive comment and start offset endrec.append(comment) endrec.append(maxCommentStart + start) |
|||
| msg149132 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2011年12月09日 22:11 | |
Here's the patch: http://hg.python.org/cpython/rev/cc3255a707c7/ I thought I remembered getting it in to 2.7.2, but my memory is evidently wrong. It has been applied, but is not yet in the released version of 2.7. |
|||
| msg149456 - (view) | Author: James C. Ahlstrom (ahlstromjc) | Date: 2011年12月14日 16:16 | |
For completeness, I checked other versions of Python. The example zip file fails in Python 3.1, but succeeds in Python 3.2.2. The patch for 3.2.2 removed the check for correct comment length, but substituted no further check for validity. |
|||
| msg225528 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2014年08月19日 10:34 | |
Original issue was fixed in cc3255a707c7. Issue reported in msg73317 was fixed in issue14315. Is there something left about this issue? |
|||
| msg228521 - (view) | Author: Mark Lawrence (BreamoreBoy) * | Date: 2014年10月05日 00:49 | |
I'm assuming that this can now be closed. |
|||
| msg230529 - (view) | Author: (era) | Date: 2014年11月03日 07:18 | |
For those who cannot update just yet, see also the workaround at http://stackoverflow.com/a/21996397/874188 |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:56:25 | admin | set | github: 45223 |
| 2016年02月24日 19:55:29 | serhiy.storchaka | set | status: open -> closed resolution: out of date stage: test needed -> resolved |
| 2014年11月03日 07:18:18 | era | set | nosy:
+ era messages: + msg230529 |
| 2014年10月05日 00:49:02 | BreamoreBoy | set | status: pending -> open nosy: + BreamoreBoy messages: + msg228521 |
| 2014年08月19日 10:34:06 | serhiy.storchaka | set | status: open -> pending nosy: + serhiy.storchaka messages: + msg225528 |
| 2011年12月14日 16:16:04 | ahlstromjc | set | messages: + msg149456 |
| 2011年12月09日 22:11:07 | r.david.murray | set | messages: + msg149132 |
| 2011年12月09日 17:11:06 | ahlstromjc | set | messages: + msg149109 |
| 2011年12月09日 14:58:35 | ahlstromjc | set | messages: + msg149103 |
| 2011年12月08日 17:14:24 | r.david.murray | set | messages: + msg149044 |
| 2011年12月08日 17:08:57 | ahlstromjc | set | nosy:
+ ahlstromjc messages: + msg149042 |
| 2011年10月19日 01:19:14 | r.david.murray | set | messages: + msg145884 |
| 2011年10月18日 09:27:49 | flox | set | nosy:
+ flox |
| 2011年10月18日 02:11:09 | antitree | set | nosy:
+ antitree messages: + msg145769 |
| 2011年09月13日 15:00:35 | alanmcintyre | set | messages: + msg143973 |
| 2011年06月09日 20:09:24 | r.david.murray | set | nosy:
+ r.david.murray messages: + msg138027 versions: + Python 3.3, - Python 3.2 |
| 2011年02月03日 13:53:03 | Ernst.Sjöstrand | set | nosy:
+ Ernst.Sjöstrand |
| 2010年09月09日 20:27:40 | markhirota | set | messages: + msg115969 |
| 2010年09月09日 05:18:38 | alanmcintyre | set | messages: + msg115928 |
| 2010年08月09日 03:07:09 | terry.reedy | set | stage: test needed versions: + Python 3.2, - Python 3.1, Python 2.7 |
| 2008年09月16日 21:53:30 | markhirota | set | nosy:
+ markhirota messages: + msg73317 |
| 2008年09月05日 13:10:23 | pitrou | set | type: enhancement versions: + Python 3.1, Python 2.7, - Python 2.5 |
| 2008年02月04日 15:17:55 | arkanes | set | messages: + msg62041 |
| 2008年02月02日 15:35:32 | alanmcintyre | set | nosy:
+ alanmcintyre messages: + msg62003 |
| 2007年08月23日 18:43:13 | georg.brandl | set | assignee: aimacintyre -> |
| 2007年08月23日 18:16:09 | georg.brandl | set | assignee: aimacintyre messages: + msg55174 nosy: + georg.brandl |
| 2007年07月19日 18:56:02 | arkanes | create | |