This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2010年12月03日 07:41 by ocean-city, last changed 2022年04月11日 14:57 by admin.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| non-ascii-cp932.zip | ocean-city, 2010年12月04日 10:45 | built with python2.7 | ||
| zipfile.patch | umedoblock, 2012年07月13日 04:16 | decode_filename zipfile.patch | ||
| encodings.py | umedoblock, 2012年07月13日 14:44 | |||
| 10614-zipfile-encoding.patch | methane, 2016年12月26日 13:02 | review | ||
| Messages (15) | |||
|---|---|---|---|
| msg123197 - (view) | Author: Hirokazu Yamamoto (ocean-city) * (Python committer) | Date: 2010年12月03日 07:41 | |
Currently, ZipFile only accepts ascii or utf8 as file name encodings. On Windows (Japanese), usually CP932 is used for it. So currently, when we melt ZipFile via py3k, non-ascii file name becomes strange. Can we handle this issue? (ie: adding encoding option for ZipFile#__init__) |
|||
| msg123201 - (view) | Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) | Date: 2010年12月03日 08:07 | |
The ZIP format specification mentions only cp437 and utf8: http://www.pkware.com/documents/casestudies/APPNOTE.TXT see Apeendix D. Do zip files created on Japanese Windows contain some information about the encoding they use? Or do some programs write cp932 where they are supposed to use one of the encodings above? |
|||
| msg123202 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2010年12月03日 08:13 | |
No, there is no indication in the zipfile that it deviates from the spec. That doesn't stop people from creating such zipfiles, anyway; many zip tools ignore the spec and use instead CP_ACP (which, of course, will then get misinterpreted if extracted on a different system). I think we must support this case somehow, but must be careful to avoid creating such files unless explicitly requested. One approach might be to have two encodings given: one to interpret the existing filenames, and one to be used for new filenames (with a recommendation to never use that parameter since zip now supports UTF-8 in a well-defined manner). |
|||
| msg123229 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2010年12月03日 12:27 | |
@Hirokazu: Can you attach a small test archive? Yes, we can add a "default_encoding" attribute to ZipFile and add an optional default_encoding argument to its constructor. |
|||
| msg123332 - (view) | Author: Hirokazu Yamamoto (ocean-city) * (Python committer) | Date: 2010年12月04日 10:45 | |
I'm not sure why, but I got BadZipFile error now. Anyway, here is cp932 zip file to be created with python2.7. |
|||
| msg126791 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2011年01月21日 22:39 | |
In #10972, I propose to add an option for the filename encoding to UTF-8. But I would like to force UTF-8 to create a ZIP file, it doesn't concern the decompression of a ZIP file. Proposal of a specification to fix both issues at the same time. "default_encoding" name is confusing because it doesn't specify if it is the encoding of (text?) file content or the encoding the filename. Why not simply "filename_encoding"? The option can be added in multiple places: - argument to ZipFile constructor: this is needed to decompress - argument to ZipFile.write() and ZipInfo, because they are 3 different manners to add files ZipFile.filename_encoding (and ZipInfo.filename_encoding) will be None by default: in this case, use the current algorithm (try cp437 or use UTF-8). Otherwise, use the encoding. If the encoding is UTF-8: set unicode flag. Examples: --- zipfile.ZipFile("non-ascii-cp932.zip", filename_encoding="cp932") f = zipfile.ZipFile("test.zip", "w") f.write(filename, filename_encoding="UTF-8") info = ZipInfo(filename, filename_encoding="UTF-8") f.writestr(info, b'data') --- Don't add filename_encoding argument to ZipFile.writestr(), because it may conflict if a ZipInfo is passed and ZipInfo.filename_encoding and filename_encoding are different. |
|||
| msg136233 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2011年05月18日 12:02 | |
I closed issue #12048 as a duplicate of this issue: yaoyu wants to uncompress a ZIP file having filenames encoded to GBK. |
|||
| msg165351 - (view) | Author: umedoblock (umedoblock) | Date: 2012年07月13日 04:16 | |
I fixed this problem. I make new methos _decode_filename(). |
|||
| msg165384 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2012年07月13日 14:02 | |
umedoblock: your patch is incorrect, as it produces moji-bake. if there is a file name b'f\x94n', it will decode as sjis under your patch (to u'f\u99ac'), even though it was meant as cp437 (i.e. u'f\xf6n'). |
|||
| msg165386 - (view) | Author: umedoblock (umedoblock) | Date: 2012年07月13日 14:44 | |
Hi, Martin. I tried your test case with attached file. And I got below result. p3 ./encodings.py encoding: sjis, filename: f馬 encoding: cp437, filename: fön sjis_filename = f馬 cp437_filename = fön There are two success cases. So I think that the patch needs to change default_encoding before or in _decode_filename(). But I have no idea about how to change a default_encoding. |
|||
| msg200187 - (view) | Author: Sergey Dorofeev (Sergey.Dorofeev) | Date: 2013年10月18日 07:30 | |
I'd like to submit patch to support zip archives created on systems that use non-US codepage (e.g. russian CP866).
Codepage would be specified in additional parameter of ZipFile constructor, named "codepage".
If it is not specified, old behavior is preserved (use CP437).
--- zipfile.py-orig 2013年09月18日 16:45:56.000000000 +0400
+++ zipfile.py 2013年10月15日 00:24:06.105157572 +0400
@@ -885,7 +885,7 @@
fp = None # Set here since __del__ checks it
_windows_illegal_name_trans_table = None
- def __init__(self, file, mode="r", compression=ZIP_STORED, allowZip64=False):
+ def __init__(self, file, mode="r", compression=ZIP_STORED, allowZip64=False, codepage='cp437'):
"""Open the ZIP file with mode read "r", write "w" or append "a"."""
if mode not in ("r", "w", "a"):
raise RuntimeError('ZipFile() requires mode "r", "w", or "a"')
@@ -901,6 +901,7 @@
self.mode = key = mode.replace('b', '')[0]
self.pwd = None
self._comment = b''
+ self.codepage = codepage
# Check if we were passed a file-like object
if isinstance(file, str):
@@ -1002,7 +1003,7 @@
filename = filename.decode('utf-8')
else:
# Historical ZIP filename encoding
- filename = filename.decode('cp437')
+ filename = filename.decode(self.codepage)
# Create ZipInfo instance to store file information
x = ZipInfo(filename)
x.extra = fp.read(centdir[_CD_EXTRA_FIELD_LENGTH])
@@ -1157,7 +1158,7 @@
# UTF-8 filename
fname_str = fname.decode("utf-8")
else:
- fname_str = fname.decode("cp437")
+ fname_str = fname.decode(self.codepage)
if fname_str != zinfo.orig_filename:
raise BadZipFile(
|
|||
| msg200193 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2013年10月18日 07:47 | |
Please rename codepage to encoding. By the way, 437 is a codepage, cp437 is a (python) encoding. I don't think that ZIP is limited to windows. I uncompressed zip files many times on various OSes, github also produces zip (and github is probably not using windows). And codepage term is only used on windows. Mac OS 9 users might produce mac roman filenames. |
|||
| msg200311 - (view) | Author: Sergey Dorofeev (Sergey.Dorofeev) | Date: 2013年10月18日 22:08 | |
OK, here you are:
--- zipfile.py-orig 2013年09月18日 16:45:56.000000000 +0400
+++ zipfile.py 2013年10月19日 01:59:07.444346674 +0400
@@ -885,7 +885,7 @@
fp = None # Set here since __del__ checks it
_windows_illegal_name_trans_table = None
- def __init__(self, file, mode="r", compression=ZIP_STORED,
allowZip64=False):
+ def __init__(self, file, mode="r", compression=ZIP_STORED,
allowZip64=False, encoding='cp437'):
"""Open the ZIP file with mode read "r", write "w" or append
"a"."""
if mode not in ("r", "w", "a"):
raise RuntimeError('ZipFile() requires mode "r", "w", or "a"')
@@ -901,6 +901,7 @@
self.mode = key = mode.replace('b', '')[0]
self.pwd = None
self._comment = b''
+ self.encoding = encoding
# Check if we were passed a file-like object
if isinstance(file, str):
@@ -1001,8 +1002,8 @@
# UTF-8 file names extension
filename = filename.decode('utf-8')
else:
- # Historical ZIP filename encoding
- filename = filename.decode('cp437')
+ # Historical ZIP filename encoding, default is CP437
+ filename = filename.decode(self.encoding)
# Create ZipInfo instance to store file information
x = ZipInfo(filename)
x.extra = fp.read(centdir[_CD_EXTRA_FIELD_LENGTH])
@@ -1157,7 +1158,7 @@
# UTF-8 filename
fname_str = fname.decode("utf-8")
else:
- fname_str = fname.decode("cp437")
+ fname_str = fname.decode(self.encoding)
if fname_str != zinfo.orig_filename:
raise BadZipFile(
On Fri, Oct 18, 2013 at 11:47 AM, STINNER Victor <report@bugs.python.org>wrote:
>
> STINNER Victor added the comment:
>
> Please rename codepage to encoding. By the way, 437 is a codepage, cp437 is
> a (python) encoding.
>
> I don't think that ZIP is limited to windows. I uncompressed zip files many
> times on various OSes, github also produces zip (and github is probably not
> using windows). And codepage term is only used on windows. Mac OS 9 users
> might produce mac roman filenames.
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue10614>
> _______________________________________
>
|
|||
| msg284025 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2016年12月26日 12:49 | |
See also issue28080. |
|||
| msg284026 - (view) | Author: Inada Naoki (methane) * (Python committer) | Date: 2016年12月26日 13:06 | |
Thanks. Patch posted in issue28080 looks better than mine. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:09 | admin | set | github: 54823 |
| 2017年06月28日 01:37:05 | vstinner | link | issue10972 superseder |
| 2016年12月26日 13:06:43 | methane | set | superseder: Allow reading member names with bogus encodings in zipfile messages: + msg284026 stage: patch review -> |
| 2016年12月26日 13:02:11 | methane | set | files: + 10614-zipfile-encoding.patch |
| 2016年12月26日 12:49:30 | serhiy.storchaka | set | messages: + msg284025 |
| 2016年12月26日 12:33:48 | methane | set | files: - 10614-zipfile-encoding.patch |
| 2016年12月26日 12:32:41 | methane | set | files:
+ 10614-zipfile-encoding.patch stage: patch review components: + Library (Lib), - Extension Modules versions: + Python 3.6, Python 3.7, - Python 3.2, Python 3.3 |
| 2015年09月12日 05:54:43 | THRlWiTi | set | nosy:
+ THRlWiTi |
| 2015年07月21日 08:05:19 | ethan.furman | set | nosy:
- ethan.furman |
| 2014年04月20日 10:29:12 | methane | set | nosy:
+ methane |
| 2014年01月21日 14:34:46 | Laurent.Mazuel | set | nosy:
+ Laurent.Mazuel |
| 2013年10月18日 22:08:51 | Sergey.Dorofeev | set | messages: + msg200311 |
| 2013年10月18日 07:47:05 | vstinner | set | messages: + msg200193 |
| 2013年10月18日 07:30:31 | Sergey.Dorofeev | set | nosy:
+ Sergey.Dorofeev messages: + msg200187 |
| 2013年10月14日 22:45:58 | ethan.furman | set | nosy:
+ ethan.furman |
| 2012年08月09日 08:47:49 | loewis | link | issue15602 superseder |
| 2012年07月13日 14:44:19 | umedoblock | set | files:
+ encodings.py messages: + msg165386 |
| 2012年07月13日 14:02:39 | loewis | set | messages: + msg165384 |
| 2012年07月13日 04:16:20 | umedoblock | set | files:
+ zipfile.patch nosy: + umedoblock messages: + msg165351 keywords: + patch |
| 2012年04月07日 19:21:35 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka |
| 2011年05月18日 12:02:34 | vstinner | set | messages: + msg136233 |
| 2011年02月01日 00:03:16 | vstinner | set | nosy:
loewis, amaury.forgeotdarc, vstinner, ocean-city title: ZipFile and CP932 encoding -> ZipFile: add a filename_encoding argument |
| 2011年01月21日 22:39:16 | vstinner | set | nosy:
loewis, amaury.forgeotdarc, vstinner, ocean-city messages: + msg126791 |
| 2010年12月04日 10:45:06 | ocean-city | set | files:
+ non-ascii-cp932.zip messages: + msg123332 |
| 2010年12月03日 12:27:16 | vstinner | set | nosy:
+ vstinner messages: + msg123229 |
| 2010年12月03日 08:13:29 | loewis | set | nosy:
+ loewis messages: + msg123202 |
| 2010年12月03日 08:07:38 | amaury.forgeotdarc | set | nosy:
+ amaury.forgeotdarc messages: + msg123201 |
| 2010年12月03日 07:41:56 | ocean-city | create | |