homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: zipfile writes incorrect local file header for large files in zip64
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.2, Python 3.3, Python 3.4, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: Kristof.Keppens, Nico.Möller, Paul, Ruben.Gonzalez, alanmcintyre, amaury.forgeotdarc, christian.heimes, craigds, dandrzejewski, enlavin, eric.araujo, gregory.p.smith, jhenry82, lambacck, loewis, nadeem.vawda, python-dev, ronaldoussoren, segfault42, serhiy.storchaka
Priority: normal Keywords: needs review, patch

Created on 2010年08月31日 01:02 by craigds, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
zipfile_zip64_header.patch craigds, 2010年08月31日 01:02
zipfile-huge-files.diff alanmcintyre, 2010年09月07日 04:57 review
zipfile_zip64_always.patch serhiy.storchaka, 2012年09月23日 09:54 Always write Zip64 extra review
zipfile_zip64_try.patch serhiy.storchaka, 2012年09月23日 09:55 Try to write Zip64 extra only if needed review
zipfile_zip64_always_2.patch serhiy.storchaka, 2012年11月28日 12:25 Always write Zip64 extra review
zipfile_zip64_try_2.patch serhiy.storchaka, 2012年11月28日 12:26 Try to write Zip64 extra only if needed review
zipfile_zip64_try_2-2.7.patch serhiy.storchaka, 2013年01月04日 13:27 review
zipfile_zip64_try_2-3.2.patch serhiy.storchaka, 2013年01月04日 13:27 review
Messages (20)
msg115250 - (view) Author: Craig de Stigter (craigds) Date: 2010年08月31日 01:02
Steps to reproduce:
# create a large (>4gb) file
f = open('foo.txt', 'wb')
text = 'a' * 1024**2
for i in xrange(5 * 1024):
 f.write(text)
f.close()
# now zip the file
import zipfile
z = zipfile.ZipFile('foo.zip', mode='w', allowZip64=True)
z.write('foo.txt')
z.close()
Now inspect the file headers using a hex editor. The written headers are incorrect. The filesize and compressed size should be written as 0xffffffff and the 'extra field' should contain the actual sizes.
Tested on Python 2.5 but looking at the latest code in 3.2 it still looks broken.
The problem is that the ZipInfo.FileHeader() is written before the filesize is populated, so Zip64 extensions are not written. Later, the sizes in the header are written, but Zip64 extensions are not taken into account and the filesize is just wrapped (7gb becomes 3gb, for instance).
My patch fixes the problem on Python 2.5, it might need minor porting to fix trunk. It works by assigning the uncompressed filesize to the ZipInfo header initially, then writing the header. Then later on, I re-write the header (this is okay since the header size will not have increased.)
msg115466 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010年09月03日 16:53
A tip about versions: Development happens on the current active branch, py3k (future 3.2 version), and bug or doc fixes are backported to the stable versions 2.7 and 3.1. Security fixes go into 2.6 too.
Can you reproduce your bug in 2.7, 3.1 and 3.2?
Adding Alan to nosy since he’s listed in Misc/maintainers.rst.
msg115514 - (view) Author: Craig de Stigter (craigds) Date: 2010年09月03日 21:47
Yes, the bug still exists in Python 3.1.2. However, struct.pack() no longer silently ignores overflow, so I get this error instead:
>>> z.write('foo.txt')
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "/usr/lib/python3.1/zipfile.py", line 1095, in write
 zinfo.file_size))
struct.error: argument out of range
msg115660 - (view) Author: Alan McIntyre (alanmcintyre) * (Python committer) Date: 2010年09月05日 17:42
Thanks for the patch, Craig; I should have some time later today or tomorrow to do a review. Did you have a patch for the test suite(s) as well? If not, I can just make sure your test case is covered in test_zipfile64.
msg115672 - (view) Author: Craig de Stigter (craigds) Date: 2010年09月05日 21:16
Hi, sorry no I haven't had time to add a real test for this
msg115741 - (view) Author: Alan McIntyre (alanmcintyre) * (Python committer) Date: 2010年09月07日 04:57
Here's an updated patch for the py3k trunk with tests. This pretty much doubles the runtime of test_zipfile64.py. The patch also removes some unnecessary code from the existing test_zipfile64 tests.
Note: It looks like writestr will also suffer from a struct.pack overflow if it's given a ZipInfo with the third general purpose flag bit set. I won't have time to address that until next weekend, probably.
msg146923 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2011年11月03日 12:17
Issue 6434 was marked as a duplicate of this issue.
msg156442 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012年03月20日 17:52
I am afraid that the problem is more complicated. With the option allowZip64=True all files need to write with this extension, because size of local file header may change and there will be after compression just go back and rewrite it.
Now it appears that the Zip64 option simply does not work.
msg170645 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2012年09月18日 13:44
Serhiy:
If I understand you correctly it should be easy to fix. The code in close() has to check if any file is beyond the ZIP64 limit and then write all headers with extra args. Is that correct?
msg171010 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012年09月22日 17:56
No, on the contrary, it is not such easy to fix, and the patch is incorrect. 
Sorry that it is not clear either. The size of the header with extra args 
depends on the size of the file. The file size can be changed in the process of 
compressing, and compressed size may be larger than uncompressed size, 
exceeding 32-bit boundary. Rewriting the header with extra args, we can 
overwrite compressed data.
I was put off the issue for further more careful research. Thanks for the 
reminder.
One solution is always (even for smallest files) to write 64-bit sizes when 
allowZip64 is true.
msg171025 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012年09月23日 09:54
I see two rational solutions of the issue (all written below is applicable only for allowZip64=True):
1) Always write Zip64 extended information extra field. This approach always successful, but the zipfile size will increase by 20 bytes for each file.
The first patch (zipfile_zip64_always.patch) uses this approach.
2) Write Zip64 extended information extra field only if assumed file size is more than a certain limit. In very rare cases this leads to the impossibility of compression of the file which can be compressed the first way. However it produces the same file as before patch in most cases.
The second patch (zipfile_zip64_try.patch) is based on Alan's patch and uses the second approach. The probability of errors is reduced and they are now detected and does not lead to a silent data damage.
Both patches are for Python 3.3. If any patch is good, I'll backport it for the older versions.
msg172648 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012年10月11日 15:08
What the conclusion about the patches? Which variant I should backport for older versions?
msg172652 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2012年10月11日 15:22
I'd write the extended header when the current file size is larger than the zip64 limit (that is, when 'st.st_size > ZIP64_LIMIT' in the write method.
That way the minimal header size is used whenever possible.
As you noted this can cause problems when the file grows beyond the limit while it is stored in the zipfile, but IMHO storing data while it is modified is asking for problems anyway.
BTW. I haven't actually review the patch yet.
msg175471 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012年11月12日 20:38
Please, review the patches.
msg176538 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012年11月28日 12:26
Patches updated to resolve merge conflict with issue11981.
Please review and apply any of this patches. This is needed for some
other my zipfile patches.
msg178603 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012年12月30日 19:11
What variant of patches should I commit? Or prepare other?
msg179013 - (view) Author: Nico Möller (Nico.Möller) Date: 2013年01月04日 10:21
I most definitely need a patch for 2.7.3 
Would be awesome if you could provide a patch for that version.
msg179019 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013年01月04日 13:27
Here are second variant patches for 2.7 and 3.2.
msg179987 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013年01月14日 22:45
New changeset ce869b05762c by Serhiy Storchaka in branch '2.7':
Issue #9720: zipfile now writes correct local headers for files larger than 4 GiB.
http://hg.python.org/cpython/rev/ce869b05762c
New changeset b93848ca7760 by Serhiy Storchaka in branch '3.2':
Issue #9720: zipfile now writes correct local headers for files larger than 4 GiB.
http://hg.python.org/cpython/rev/b93848ca7760
New changeset 656a45738e5e by Serhiy Storchaka in branch '3.3':
Issue #9720: zipfile now writes correct local headers for files larger than 4 GiB.
http://hg.python.org/cpython/rev/656a45738e5e
New changeset 628a6af64a46 by Serhiy Storchaka in branch 'default':
Issue #9720: zipfile now writes correct local headers for files larger than 4 GiB.
http://hg.python.org/cpython/rev/628a6af64a46 
msg179989 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013年01月14日 22:49
Fixed. Thank you for report, Craig de Stigter.
History
Date User Action Args
2022年04月11日 14:57:05adminsetgithub: 53929
2013年01月14日 22:49:08serhiy.storchakasetstatus: open -> closed
resolution: fixed
messages: + msg179989

stage: patch review -> resolved
2013年01月14日 22:45:09python-devsetnosy: + python-dev
messages: + msg179987
2013年01月04日 13:27:38serhiy.storchakasetfiles: + zipfile_zip64_try_2-2.7.patch, zipfile_zip64_try_2-3.2.patch

messages: + msg179019
2013年01月04日 10:21:58Nico.Möllersetnosy: + Nico.Möller
messages: + msg179013
2012年12月30日 19:11:38serhiy.storchakasetmessages: + msg178603
2012年12月29日 22:08:10serhiy.storchakasetassignee: serhiy.storchaka
2012年11月28日 12:26:01serhiy.storchakasetfiles: + zipfile_zip64_always_2.patch, zipfile_zip64_try_2.patch

messages: + msg176538
2012年11月26日 20:32:14jhenry82setnosy: + jhenry82
2012年11月12日 20:38:26serhiy.storchakasetmessages: + msg175471
2012年10月19日 08:54:37Ruben.Gonzalezsetnosy: + Ruben.Gonzalez
2012年10月11日 15:22:27ronaldoussorensetmessages: + msg172652
2012年10月11日 15:08:29serhiy.storchakasetmessages: + msg172648
versions: + Python 3.4
2012年09月23日 09:55:46serhiy.storchakasetfiles: + zipfile_zip64_try.patch
stage: needs patch -> patch review
2012年09月23日 09:54:19serhiy.storchakasetfiles: + zipfile_zip64_always.patch
nosy: + loewis, gregory.p.smith, ronaldoussoren
messages: + msg171025

2012年09月22日 17:56:23serhiy.storchakasetmessages: + msg171010
2012年09月18日 13:44:30christian.heimessetkeywords: + needs review
nosy: + christian.heimes
messages: + msg170645

2012年09月18日 13:25:53Kristof.Keppenssetnosy: + Kristof.Keppens
2012年03月20日 17:52:08serhiy.storchakasetmessages: + msg156442
2012年03月20日 17:13:23serhiy.storchakasetnosy: + serhiy.storchaka
2012年03月20日 14:35:55dandrzejewskisetnosy: + dandrzejewski
2011年11月03日 12:17:55nadeem.vawdasetversions: + Python 3.3, - Python 3.1
nosy: + amaury.forgeotdarc, nadeem.vawda, lambacck, segfault42, enlavin, Paul

messages: + msg146923

stage: needs patch
2011年11月03日 12:17:17nadeem.vawdalinkissue6434 superseder
2010年09月07日 04:57:46alanmcintyresetfiles: + zipfile-huge-files.diff

messages: + msg115741
2010年09月05日 21:16:38craigdssetmessages: + msg115672
2010年09月05日 17:42:17alanmcintyresetmessages: + msg115660
2010年09月03日 21:47:12craigdssetmessages: + msg115514
2010年09月03日 16:53:46eric.araujosetnosy: + eric.araujo, alanmcintyre

messages: + msg115466
versions: - Python 2.6, Python 2.5, Python 3.3
2010年08月31日 01:02:17craigdscreate

AltStyle によって変換されたページ (->オリジナル) /