homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: zipfile: add "unicode" option to the force the filename encoding to UTF-8
Type: Stage: resolved
Components: Library (Lib), Unicode Versions: Python 3.2, Python 3.3
process
Status: closed Resolution: duplicate
Dependencies: Superseder: ZipFile: add a filename_encoding argument
View: 10614
Assigned To: Nosy List: THRlWiTi, alanmcintyre, amaury.forgeotdarc, pitrou, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2011年01月21日 12:00 by vstinner, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
zipfile_unicode.patch vstinner, 2011年01月21日 12:00
Messages (12)
msg126724 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年01月21日 12:00
ZipInfo._encodeFilename() tries cp437 encoding or use UTF-8. It is not possible to decide the encoding.
To workaround #10955 (bootstrap issue with python32.zip), it would be nice to be able to create a ZIP file using only UTF-8 filenames.
Attached patch adds unicode parameter to ZipFile.write(), ZipFile.writestr() and ZipInfo constructor.
msg126725 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年01月21日 12:03
Oh, this patch fixes also a bug: ZipFile._RealGetContents() doesn't keep the unicode flag, so open a ZIP file and then write it somewhere else may change the unicode flag if unicode flag was set but the filename is also encodable to UTF-8 (eg. ASCII filename).
msg126727 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年01月21日 12:07
7zip and WinRAR uses the same algorithm than ZipFile._encodeFilename(): try cp437 or use UTF-8. Eg. if a filename contains ∞ (U+221E), it is encoded to UTF-8.
WinZIP encodes all filenames to cp437: ∞ (U+221E) is replaced by 8 (U+0038), ☺ (U+263A) is replaced by... U+0001!
7zip, WinRAR and WinZIP are able to decode UTF-8 filenames (handle correctly the unicode flag).
msg126731 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011年01月21日 12:18
What kind of problem are you trying to solve?
msg126734 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年01月21日 13:00
> What kind of problem are you trying to solve?
Support non-ASCII filenames in python32.zip (#10955): at bootstrap, Python 3.2 can only use UTF-8 codec (not cp437).
But I suppose also that forcing the encoding to UTF-8 gives a better Unicode support (when you decompress the archive).
msg126735 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011年01月21日 13:03
> Support non-ASCII filenames in python32.zip (#10955): at bootstrap,
> Python 3.2 can only use UTF-8 codec (not cp437).
> 
> But I suppose also that forcing the encoding to UTF-8 gives a better
> Unicode support (when you decompress the archive).
The question is, rather, why you need an external flag for that.
msg126745 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年01月21日 15:02
> The question is, rather, why you need an external flag for that.
Because I don't want to change the default encoding. I am not sure that all applications support UTF-8 encodings.
But if you control your environment, force UTF-8 encoding should improve your Unicode support.
msg126746 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011年01月21日 15:12
> > The question is, rather, why you need an external flag for that.
> 
> Because I don't want to change the default encoding. I am not sure
> that all applications support UTF-8 encodings.
If this is a ZIP standard flag, why should we care about applications
which don't support it? Should we add other flags to disable other
features out of fear that other applications might not support them
either?
> But if you control your environment, force UTF-8 encoding should
> improve your Unicode support.
How is a random user supposed to know if their tools support UTF-8
encoding? It's not like everyone is an expert in ZIP files. This is the
kind of situation where asking the user to make a choice is more
confusing than helpful. When adding the flag, not only you complicate
the API, but you have to support this flag for the rest of your life
(well, almost :-)).
We could instead use utf-8 by default for all non-ascii filenames (and
*perhaps* have a separate force_cp437 flag, but default it to False).
msg126759 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2011年01月21日 17:59
This looks similar to issue10614 
msg276182 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016年09月13日 06:18
Now UTF-8 is used for non-ASCII names. Can this issue be closed as outdated?
msg297125 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017年06月28日 01:37
> This looks similar to issue10614
Right. Let's focus on that one which has a better design. "unicode" means everything and nothing. It's more reliable to specify an encoding.
msg297148 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017年06月28日 03:58
See also issue28080.
History
Date User Action Args
2022年04月11日 14:57:11adminsetgithub: 55181
2017年06月28日 03:58:24serhiy.storchakasetmessages: + msg297148
2017年06月28日 01:37:05vstinnersetstatus: open -> closed
superseder: ZipFile: add a filename_encoding argument
messages: + msg297125

resolution: duplicate
stage: resolved
2016年09月13日 06:18:50serhiy.storchakasetmessages: + msg276182
2015年09月12日 05:55:36THRlWiTisetnosy: + THRlWiTi
2015年07月21日 08:11:16ethan.furmansetnosy: - ethan.furman
2013年10月14日 22:43:18ethan.furmansetnosy: + ethan.furman
2012年04月07日 19:22:03serhiy.storchakasetnosy: + serhiy.storchaka
2011年01月21日 17:59:51amaury.forgeotdarcsetmessages: + msg126759
2011年01月21日 15:16:13pitrousetnosy: + amaury.forgeotdarc
2011年01月21日 15:12:22pitrousetmessages: + msg126746
2011年01月21日 15:02:07vstinnersetmessages: + msg126745
2011年01月21日 13:03:42pitrousetmessages: + msg126735
2011年01月21日 13:00:51vstinnersetmessages: + msg126734
2011年01月21日 12:18:49pitrousetnosy: + pitrou
messages: + msg126731
2011年01月21日 12:07:38vstinnersettitle: zipfile: add unicode option to the choose filename encoding -> zipfile: add "unicode" option to the force the filename encoding to UTF-8
2011年01月21日 12:07:08vstinnersetnosy: + alanmcintyre
messages: + msg126727
2011年01月21日 12:03:06vstinnersetmessages: + msg126725
2011年01月21日 12:00:43vstinnercreate

AltStyle によって変換されたページ (->オリジナル) /