This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2008年12月10日 16:27 by francescor, last changed 2022年04月11日 14:56 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| test.zip | francescor, 2008年12月10日 16:27 | |||
| x.zip | vstinner, 2008年12月20日 14:18 | |||
| patch.diff | skreft, 2008年12月21日 02:58 | |||
| testzip.py | v+python, 2010年03月27日 05:48 | test case for opening zip members using \ separator | ||
| Messages (14) | |||
|---|---|---|---|
| msg77555 - (view) | Author: Francesco Ricciardi (francescor) | Date: 2008年12月10日 16:27 | |
Each entry of a zip file, as read by the zipfile module, can be accessed
via a ZipInfo object. The filename attribute of ZipInfo is a string.
However, the read method of a ZipFile object expects a binary as
argument, or at least this is what I can deduct from the following behavior:
>>> import zipfile
>>> testzip = zipfile.ZipFile('test.zip')
>>> t1 = testzip.infolist()[0]
>>> t1.filename
'tést.xml'
>>> data = testzip.read(testzip.infolist()[0])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python30\lib\zipfile.py", line 843, in read
return self.open(name, "r", pwd).read()
File "C:\Python30\lib\zipfile.py", line 883, in open
% (zinfo.orig_filename, fname))
zipfile.BadZipfile: File name in directory 'tést.xml' and header
b't\x82st.xml' differ.
The test.zip file is attached as help in reproducing this error.
|
|||
| msg78004 - (view) | Author: (skreft) | Date: 2008年12月18日 01:34 | |
The error you got is caused by giving the wrong parameters. You gave a ZipInfo object instead of a filename. If you execute data = testzip.read(t1.filename) yo will have no problems. |
|||
| msg78014 - (view) | Author: Francesco Ricciardi (francescor) | Date: 2008年12月18日 07:33 | |
If that is what is requested, then the manual entry for ZipFile.read
must be corrected, because it states:
"ZipFile.read(name[, pwd]) .... name is the name of the file in the
archive, or a ZipInfo object."
However, Eddie, you haven't tried what you suggested, because this is
what you would get:
>>> import zipfile
>>> testzip = zipfile.ZipFile('test.zip')
>>> t1 = testzip.infolist()[0]
>>> t1.filename
'tést.xml'
>>> data = testzip.read(t1.filename)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python30\lib\zipfile.py", line 843, in read
return self.open(name, "r", pwd).read()
File "C:\Python30\lib\zipfile.py", line 883, in open
% (zinfo.orig_filename, fname))
zipfile.BadZipfile: File name in directory 'tést.xml' and header
b't\x82st.xml' differ.
|
|||
| msg78025 - (view) | Author: (skreft) | Date: 2008年12月18日 14:17 | |
Sorry, my bad.
I did tried it but with the wrong version (2.5). And it worked perfectly.
So sorry again for my mistake.
Anyways, I've found the error.
The problem is caused by different encodings used when zipping.
In open, the method is comparing b't\x82st.xml' against
b't\xc3\xa9st.xml', and of course they are different.
But they are no so different, because b't\x82st.xml' is
'tést'.encode('cp437') and b't\xc3\xa9st.xml' is 'tést'.encode(utf-8).
The problem arises because the open method supposes the filename is in
utf-8 encoding, but in __init__ it realizes that the encoding depends on
the flags.
if flags & 0x800:
filename = filename.decode.('utf-8')
else:
filename = filename.decode.('cp437')
|
|||
| msg78104 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2008年12月20日 14:03 | |
In the ZIP file format, a filename is a byte string because we don't know the encoding. You can not guess the encoding because it's not stored in the ZIP file and it depends on the OS and the OS configuration. So t1.filename have to be a byte string and testzip.read() have to use bytes and not str. |
|||
| msg78105 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2008年12月20日 14:06 | |
Oh, I see that zipfile.py uses the following code to choose the
filename encoding:
if flags & 0x800:
# UTF-8 file names extension
filename = filename.decode('utf-8')
else:
# Historical ZIP filename encoding
filename = filename.decode('cp437')
So I'm maybe wrong: the encoding is known using a flag?
|
|||
| msg78107 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2008年12月20日 14:18 | |
Test on Ubuntu Gutsy (utf8 file system) with zip 2.32:
$ mkdir x
$ touch x/hé
$ zip -r x.zip x
adding: x/ (stored 0%)
adding: x/hé (stored 0%)
$ python # 3.0 trunk
>>> import zipfile
>>> testzip = zipfile.ZipFile('x.zip')
>>> testzip.infolist()[1].filename
'x/hé'
>>> print(ascii(testzip.infolist()[1].filename))
'x/h\u251c\u2310'
Using my own file parse (hachoir-wx), I can see that flags=0 and
filename=bytes {78 2f 68 c3 a9} ("x/hé" in UTF-8).
You can try x.zip: I attached the file.
|
|||
| msg78111 - (view) | Author: (skreft) | Date: 2008年12月20日 16:06 | |
The problem is not about reading the filenames, but reading the contents of a file with filename that has non-ascii charaters. |
|||
| msg78137 - (view) | Author: (skreft) | Date: 2008年12月21日 02:52 | |
I read again what STINNER Victor and I think that he found another bug.
Because, when listing the filenames of that zip file, the names are not
displayed correctly. In fact
'x/h├⌐' == 'x/hé'.encode('utf-8').decode('cp437')
So, there is again a problem with encodings when reading the contents.
The problem here is that when reading one can not give the filename,
because is not a key in the NameToInfo dictionary.
|
|||
| msg78138 - (view) | Author: (skreft) | Date: 2008年12月21日 02:58 | |
Attached is a patch that solves (I hope) the initial problem, the one from Francesco Ricciardi. |
|||
| msg101820 - (view) | Author: Glenn Linderman (v+python) * | Date: 2010年03月27日 05:48 | |
I just "discovered" that attempting to open zip member "test\file" fails where attempting to open "test/file" works. Granted the zip contains "/" not "\" characters, but using the os.path stuff (on windows) to manipulate the names before attempting to open the zip member produces "\" characters. Clearly, I could switch them back. It seems pretty clear that zipfile should do that for me, though. A small, self-contained zip file test case is attached, being a zip that is named .py My testing using Python 3.1.1 |
|||
| msg136224 - (view) | Author: Tor Arvid Lund (talund) | Date: 2011年05月18日 11:04 | |
I was wondering what has prevented Eddies patch from being included into python. Has nobody volunteered to verify that it works? I would be willing to do that, though I have never compiled python on any platform before.
It just seems a bit silly to me that python cannot work with zip files with unicode file names... I just now had to do 'os.system("unzip.exe ...")' because zipfile did not work for me...
|
|||
| msg136227 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2011年05月18日 11:31 | |
This issue looks to be a duplicate of #10801 which was only fixed (33543b4e0e5d) in Python 3.2. See also #12048: similar issue in Python 3.1. |
|||
| msg136231 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2011年05月18日 12:00 | |
The initial problem is clearly a duplicate of issue #10801 which is now fixed in Python 3.1+ (I just backported the fix to Python 3.1). > I just "discovered" that attempting to open zip member "test\file" > fails where attempting to open "test/file" works. (...) > It seems pretty clear that zipfile should do that for me, though. @v+python: I don't think so, but others may agree with you. Please open a new issue, because it is unrelated to the initial bug report. I'm closing this issue because the initial is now fixed. For x.zip (UTF-8 encoded filenames with the "Unicode" flag) problem, there is already the issue #10614 which handles this case. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:56:42 | admin | set | github: 48871 |
| 2011年05月18日 12:00:18 | vstinner | set | status: open -> closed resolution: fixed messages: + msg136231 |
| 2011年05月18日 11:31:32 | vstinner | set | messages: + msg136227 |
| 2011年05月18日 11:04:32 | talund | set | nosy:
+ talund messages: + msg136224 |
| 2010年03月27日 05:48:21 | v+python | set | files:
+ testzip.py nosy: + v+python messages: + msg101820 |
| 2008年12月21日 02:58:08 | skreft | set | files:
+ patch.diff keywords: + patch messages: + msg78138 |
| 2008年12月21日 02:52:40 | skreft | set | messages: + msg78137 |
| 2008年12月20日 16:06:55 | skreft | set | messages: + msg78111 |
| 2008年12月20日 14:18:26 | vstinner | set | files:
+ x.zip messages: + msg78107 |
| 2008年12月20日 14:06:34 | vstinner | set | messages: + msg78105 |
| 2008年12月20日 14:03:03 | vstinner | set | nosy:
+ vstinner messages: + msg78104 |
| 2008年12月18日 14:17:54 | skreft | set | messages: + msg78025 |
| 2008年12月18日 07:33:03 | francescor | set | messages: + msg78014 |
| 2008年12月18日 01:34:14 | skreft | set | nosy:
+ skreft messages: + msg78004 |
| 2008年12月10日 16:27:46 | francescor | create | |