homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Preventing errors of simultaneous access in zipfile
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.4
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: alanmcintyre, dw, jcea, kasal, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2012年11月28日 14:21 by serhiy.storchaka, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
zipfile_simultaneous.patch serhiy.storchaka, 2012年11月28日 14:21 review
patch dw, 2014年11月11日 03:16 Representative modification to zipfile.py
Messages (10)
msg176544 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012年11月28日 14:21
If the ZipFile was created by passing in a file-like object as the first argument to the constructor, then simultaneous reading or writing of different file results in an non-consistent state. There is a warning about this in the documentation. The proposed patch forces this condition, raising the early exception if you attempt to simultaneously access.
I'm not sure whether it's worth apply to older versions.
msg176546 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2012年11月28日 14:46
I am -0 to this. We can't prevent programmers for shotting in the foot.
msg176548 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012年11月28日 15:05
Reading from closed ZipFile or reading from ZipFile opened for write already forbidden. This is a preventing of the same kind.
msg176849 - (view) Author: Stepan Kasal (kasal) Date: 2012年12月03日 16:03
I agree that reading from a file open for write should be forbidden, no matter whether ZipFile was called with fp or a name.
Actually, it is not yet forbidden, and two of the tests in the zipfile.py test suite do actually rely on this misfeature.
The first chunk in the patch http://bugs.python.org/file24624/Proposed-fix-of-issue14099-second.patch contains a fix for this bug in test suite.
OTOH, decompressing several files for a given zip file simultaneously does not sound that bad. You know, with all the current file managers, people look at a zip as if it were kind of a directory.
msg176852 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012年12月03日 16:37
> Actually, it is not yet forbidden, and two of the tests in the zipfile.py test suite do actually rely on this misfeature.
Indeed. I missed that.
Actually these tests work by accident, due to the fact that the contents of the zipfile is placed in the file object buffer.
> OTOH, decompressing several files for a given zip file simultaneously does not sound that bad. You know, with all the current file managers, people look at a zip as if it were kind of a directory.
I agree, but I'm afraid it's impossible to do without performance regression due to seek before every read. And for now ZipFile is not support simultaneous reading when external file object used. Also ZipFile is not thread-safe in any case. You can open several ZipFiles for simultaneous reading.
msg176856 - (view) Author: Stepan Kasal (kasal) Date: 2012年12月03日 16:51
> but I'm afraid it's impossible to do without performance regression due to seek before every read.
I agree that this is key question.
I would hope that the performance hit wouldn't be so bad, unless there are actually two decompressions running concurrently.
So we can have an implementation that is generally correct, though some use scenarios result in slow execution.
OTOH, if the seek() call were a problem even if the new position is the same as the old one, they can be optimized out by a simple wrapper around fp.
msg177006 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2012年12月05日 20:24
Seek can be very cheap. Anybody could actually measure it???
msg177037 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012年12月06日 10:48
> Seek can be very cheap. Anybody could actually measure it???
I am waiting for an updated patch for issue14099 to make benchmarks.
msg230998 - (view) Author: David Wilson (dw) * Date: 2014年11月11日 03:16
Compared to the cost of everything else ZipExtFile must do (e.g. 4kb string concatenation in a loop, zlib), its surprising that lseek() would measurable at all. 
The attached file 'patch' is the minimal change I tested. It represents, in terms of computation and system call overhead, all required to implement the "seek before read" solution to simultaneous access. On OSX, churning over ever member of every ZIP in my downloads directory (about 400mb worth), this change results in around 0.9% overhead compared to the original module.
Subsequently I'm strongly against the patch here. It is in effect papering over an implementation deficiency of the current zipfile module, one that could easily and cheaply be addressed.
(My comment on this ticket is in the context of the now-marked-duplicate issue22842).
msg232071 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014年12月03日 07:32
Closed in favor of issue14099.
History
Date User Action Args
2022年04月11日 14:57:38adminsetgithub: 60773
2014年12月03日 07:32:37serhiy.storchakasetstatus: open -> closed
resolution: rejected
messages: + msg232071

stage: patch review -> resolved
2014年11月11日 03:16:43dwsetfiles: + patch
nosy: + dw
messages: + msg230998

2014年11月11日 01:37:58r.david.murraylinkissue22842 superseder
2013年05月07日 17:34:00serhiy.storchakasetmessages: - msg176851
2013年05月07日 17:33:28serhiy.storchakasetmessages: - msg176850
2012年12月29日 22:04:11serhiy.storchakasetassignee: serhiy.storchaka
2012年12月06日 10:48:08serhiy.storchakasetmessages: + msg177037
2012年12月05日 20:24:39jceasetmessages: + msg177006
2012年12月03日 16:51:35kasalsetmessages: + msg176856
2012年12月03日 16:37:58serhiy.storchakasetmessages: + msg176852
2012年12月03日 16:16:22serhiy.storchakasetmessages: + msg176851
2012年12月03日 16:14:53serhiy.storchakasetmessages: + msg176850
2012年12月03日 16:03:26kasalsetnosy: + kasal
messages: + msg176849
2012年11月28日 15:05:14serhiy.storchakasetmessages: + msg176548
2012年11月28日 14:46:26jceasetnosy: + jcea
messages: + msg176546
2012年11月28日 14:21:51serhiy.storchakacreate

AltStyle によって変換されたページ (->オリジナル) /