This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2011年01月21日 12:00 by vstinner, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| zipfile_unicode.patch | vstinner, 2011年01月21日 12:00 | |||
| Messages (12) | |||
|---|---|---|---|
| msg126724 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2011年01月21日 12:00 | |
ZipInfo._encodeFilename() tries cp437 encoding or use UTF-8. It is not possible to decide the encoding. To workaround #10955 (bootstrap issue with python32.zip), it would be nice to be able to create a ZIP file using only UTF-8 filenames. Attached patch adds unicode parameter to ZipFile.write(), ZipFile.writestr() and ZipInfo constructor. |
|||
| msg126725 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2011年01月21日 12:03 | |
Oh, this patch fixes also a bug: ZipFile._RealGetContents() doesn't keep the unicode flag, so open a ZIP file and then write it somewhere else may change the unicode flag if unicode flag was set but the filename is also encodable to UTF-8 (eg. ASCII filename). |
|||
| msg126727 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2011年01月21日 12:07 | |
7zip and WinRAR uses the same algorithm than ZipFile._encodeFilename(): try cp437 or use UTF-8. Eg. if a filename contains ∞ (U+221E), it is encoded to UTF-8. WinZIP encodes all filenames to cp437: ∞ (U+221E) is replaced by 8 (U+0038), ☺ (U+263A) is replaced by... U+0001! 7zip, WinRAR and WinZIP are able to decode UTF-8 filenames (handle correctly the unicode flag). |
|||
| msg126731 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2011年01月21日 12:18 | |
What kind of problem are you trying to solve? |
|||
| msg126734 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2011年01月21日 13:00 | |
> What kind of problem are you trying to solve? Support non-ASCII filenames in python32.zip (#10955): at bootstrap, Python 3.2 can only use UTF-8 codec (not cp437). But I suppose also that forcing the encoding to UTF-8 gives a better Unicode support (when you decompress the archive). |
|||
| msg126735 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2011年01月21日 13:03 | |
> Support non-ASCII filenames in python32.zip (#10955): at bootstrap, > Python 3.2 can only use UTF-8 codec (not cp437). > > But I suppose also that forcing the encoding to UTF-8 gives a better > Unicode support (when you decompress the archive). The question is, rather, why you need an external flag for that. |
|||
| msg126745 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2011年01月21日 15:02 | |
> The question is, rather, why you need an external flag for that. Because I don't want to change the default encoding. I am not sure that all applications support UTF-8 encodings. But if you control your environment, force UTF-8 encoding should improve your Unicode support. |
|||
| msg126746 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2011年01月21日 15:12 | |
> > The question is, rather, why you need an external flag for that. > > Because I don't want to change the default encoding. I am not sure > that all applications support UTF-8 encodings. If this is a ZIP standard flag, why should we care about applications which don't support it? Should we add other flags to disable other features out of fear that other applications might not support them either? > But if you control your environment, force UTF-8 encoding should > improve your Unicode support. How is a random user supposed to know if their tools support UTF-8 encoding? It's not like everyone is an expert in ZIP files. This is the kind of situation where asking the user to make a choice is more confusing than helpful. When adding the flag, not only you complicate the API, but you have to support this flag for the rest of your life (well, almost :-)). We could instead use utf-8 by default for all non-ascii filenames (and *perhaps* have a separate force_cp437 flag, but default it to False). |
|||
| msg126759 - (view) | Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) | Date: 2011年01月21日 17:59 | |
This looks similar to issue10614 |
|||
| msg276182 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2016年09月13日 06:18 | |
Now UTF-8 is used for non-ASCII names. Can this issue be closed as outdated? |
|||
| msg297125 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2017年06月28日 01:37 | |
> This looks similar to issue10614 Right. Let's focus on that one which has a better design. "unicode" means everything and nothing. It's more reliable to specify an encoding. |
|||
| msg297148 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2017年06月28日 03:58 | |
See also issue28080. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:11 | admin | set | github: 55181 |
| 2017年06月28日 03:58:24 | serhiy.storchaka | set | messages: + msg297148 |
| 2017年06月28日 01:37:05 | vstinner | set | status: open -> closed superseder: ZipFile: add a filename_encoding argument messages: + msg297125 resolution: duplicate stage: resolved |
| 2016年09月13日 06:18:50 | serhiy.storchaka | set | messages: + msg276182 |
| 2015年09月12日 05:55:36 | THRlWiTi | set | nosy:
+ THRlWiTi |
| 2015年07月21日 08:11:16 | ethan.furman | set | nosy:
- ethan.furman |
| 2013年10月14日 22:43:18 | ethan.furman | set | nosy:
+ ethan.furman |
| 2012年04月07日 19:22:03 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka |
| 2011年01月21日 17:59:51 | amaury.forgeotdarc | set | messages: + msg126759 |
| 2011年01月21日 15:16:13 | pitrou | set | nosy:
+ amaury.forgeotdarc |
| 2011年01月21日 15:12:22 | pitrou | set | messages: + msg126746 |
| 2011年01月21日 15:02:07 | vstinner | set | messages: + msg126745 |
| 2011年01月21日 13:03:42 | pitrou | set | messages: + msg126735 |
| 2011年01月21日 13:00:51 | vstinner | set | messages: + msg126734 |
| 2011年01月21日 12:18:49 | pitrou | set | nosy:
+ pitrou messages: + msg126731 |
| 2011年01月21日 12:07:38 | vstinner | set | title: zipfile: add unicode option to the choose filename encoding -> zipfile: add "unicode" option to the force the filename encoding to UTF-8 |
| 2011年01月21日 12:07:08 | vstinner | set | nosy:
+ alanmcintyre messages: + msg126727 |
| 2011年01月21日 12:03:06 | vstinner | set | messages: + msg126725 |
| 2011年01月21日 12:00:43 | vstinner | create | |