This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2009年08月22日 06:22 by ryles, last changed 2022年04月11日 14:56 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| issue6759_1.diff | agillesp, 2009年08月22日 22:37 | Patch for Lib/ziptest.py and Lib/test/test_zipfile.py | ||
| issue6759_2.diff | agillesp, 2009年08月26日 17:08 | |||
| issue6759_3.diff | agillesp, 2009年08月26日 22:43 | Patch 3 | ||
| issue6759_4.diff | agillesp, 2009年08月27日 03:57 | Patch 4 | ||
| Messages (19) | |||
|---|---|---|---|
| msg91854 - (view) | Author: Ryan Leslie (ryles) | Date: 2009年08月22日 06:22 | |
The zipfile.ZipFile.open() behavior with mode 'U' or 'rU' is not quite as advertised in http://docs.python.org/library/zipfile.html#zipfile.ZipFile.open Here is an example: $ echo -ne "This is an example\r\nWhich demonstrates a problem\r\nwith ZipFile.open(..., 'U')\r\n" > foo.txt $ cat -v foo.txt This is an example^M Which demonstrates a problem^M with ZipFile.open(..., 'U')^M $ zip foo.zip foo.txt adding: foo.txt (deflated 1%) $ python Python 2.6.2 (r262:71600, Aug 21 2009, 17:52:12) [GCC 4.1.2 20071124 (Red Hat 4.1.2-42)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> open("foo.txt", 'U').read() "This is an example\nWhich demonstrates a problem\nwith ZipFile.open(..., 'U')\n" >>> from zipfile import ZipFile >>> ZipFile("foo.zip").open("foo.txt", 'U').read() "This is an example\r\nWhich demonstrates a problem\r\nwith ZipFile.open(..., 'U')\r\n" >>> The open() method was added here: http://bugs.python.org/issue1121142 The cause is that the universal newline implementation is specific to readline(), which also implements readlines() and next() as well. Support was never added for read(), which is independent. Note that test_zipfile.UniversalNewlineTests.readTest() passes. This is suspect because it's explicitly coded to *not* expect translation of new line sequences. |
|||
| msg91880 - (view) | Author: Art Gillespie (agillesp) | Date: 2009年08月22日 22:37 | |
Patch for both zipfile.py and test_zipfile.py attached. * The universal newline logic is now in read instead of readline. * UniversalNewlineTests.read_test changed to check for \n rather than unchanged eol. |
|||
| msg91962 - (view) | Author: Ryan Leslie (ryles) | Date: 2009年08月26日 04:44 | |
Hi Art, Thanks for working on this. I've taken a look at the patch. The fix to read_test looks correct. Of course, I would consider a more descriptive variable name than 'b'. The changes to read() are an improvement, but I think we need to be careful when we replace "\r\n" with "\n". Basically, we've turned two characters into one and are now potentially one character short of 'size' bytes. This doesn't match the behavior of file.read(). Another thing to work out is the lack of the 'newlines' attribute, discussed in PEP 278. I've noted that bz2 seems to do a pretty good job with universal newline handling: python/trunk/Modules/bz2module.c. It's in C, however, and I don't think there's actually anything in the library doing UL in pure Python. |
|||
| msg91971 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2009年08月26日 13:05 | |
The python version of the io module does (_pyio.py). I've set the stage to 'test needed' because the test needs to be turned into a unit test. |
|||
| msg91976 - (view) | Author: Art Gillespie (agillesp) | Date: 2009年08月26日 17:08 | |
Hi Ryan, Thanks for the feedback. I've attached a new patch that fixes the read(nbytes) behavior--It will now always return the requested number of bytes regardless of newline replacement. There's now a unit test for this as well. I also added the newlines attribute per PEP 278 and a corresponding test. I'm not sure I understood David's comment that read_test needed to be turned into a unit test: it's called by several of the unit tests (test_read_stored, test_read_deflated, et. al.) in that class. |
|||
| msg91977 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2009年08月26日 17:32 | |
My apologies, I misread the patch. |
|||
| msg91981 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2009年08月26日 18:55 | |
Thanks for working on this.
Comments on patch:
(1) I think you should retain the full "ugly check" comment explaining
how the \r\n spanning a buffer is handled. I think that's a helpful
explanation for a non-obvious piece of code.
(2) Your code for setting the newlines attributes and the tests for it
look wrong to me. The attribute is supposed to list all separators
seen. I think your code is going to set it to ('\r\n', '\r') when the
separators are just '\r\n', and your test doesn't check to make sure
newlines gets set to _only_ what was seen. (Nor does it check for the
case of actual mixed line endings...a test for which would require some
additional test hackery since the existing test code didn't try to test
mixed line end files either.)
(3) Sorry for being dense, but I don't understand your code to adjust
the number of bytes in the read method or the test that tests it. I
haven't tried to play with it, but it looks odd to me that you'd be
adding '2', since I would think that the read could cover more than one
line...or am I misunderstanding how 'read(n)' works (which is entirely
possible)?
|
|||
| msg91987 - (view) | Author: Art Gillespie (agillesp) | Date: 2009年08月26日 22:43 | |
Hi David, Thanks for the review. Patch attached. (1) I've moved that comment to the check's new location. (2) Fixed the bug and added tests for only one separator. Also added test data and tests for mixed eol files. (3) I changed this so that when the file is opened with universal newline support, read(size) makes multiple calls to _do_read until size bytes are read or EOF is reached. |
|||
| msg91990 - (view) | Author: Art Gillespie (agillesp) | Date: 2009年08月26日 23:33 | |
Just found another bug in the code that sets the newlines attribute. Please disregard issue6759_3.diff |
|||
| msg91994 - (view) | Author: Art Gillespie (agillesp) | Date: 2009年08月27日 03:57 | |
Latest patch attached. * Fixed the code that populates the newlines attribute. I think I've covered all the cases... * Found another deviation from file object behavior in this module: Calling read with a negative size parameter does not always return the remainder of the file as described in http://docs.python.org/library/stdtypes.html#file.read I went ahead and fixed this--please let me know if I should open a separate issue and submit a separate patch. * Added more tests for mixed eol files, calling read with a negative size parameter, reading a file with only crlfs |
|||
| msg111491 - (view) | Author: Mark Lawrence (BreamoreBoy) * | Date: 2010年07月24日 15:59 | |
@Art: @David: A lot of changes have been made to test_zipfile since the last patch was produced. I'm unsure as to whether the patch could be applied to 2.7 as is, or it would be simpler to rework the patch for 2.7 and the py3k branches. Could one of you take a look please, thanks. |
|||
| msg111504 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2010年07月24日 20:26 | |
Well, you could try applying the patch and see if it applies cleanly and passes the tests. But it would be helpful to have it reworked for py3k, since we now commit first to py3k and then backport to 2.7. I have not reviewed the updated patch. |
|||
| msg159594 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2012年04月29日 09:41 | |
Support lines in zipfile looks contradictory and buggy. This complicates the code and makes the behavior of zipfile.ZipExtFile incompatible with the interface of other file-like objects. For example, the behavior of the read, read1 and peek differ from one described in io module. If we are working with binary data, conversion of newlines is meaningless (and how about newlines in comments?). If we are working with text, the bytes must be decoded to str. This will help io.TextIOWrapper. I suggest two alternatives: 1. Deprecate universal newline support in zipfile. zipfile.ZipExtFile should always work with non-modified bytes, and who need the text, let wraps zipfile.ZipExtFile with io.TextIOWrapper. 2. Automatically wrap a zipfile.ZipExtFile with io.TextIOWrapper if universal newlines mode is enabled. In this case, the data type will change from bytes to str. Add modes "t" and "b" to explicitly specify the data type. Add an encoding parameter (and other parameters if needed). |
|||
| msg159608 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2012年04月29日 13:54 | |
> Deprecate universal newline support in zipfile. zipfile.ZipExtFile > should always work with non-modified bytes, and who need the text, let > wraps zipfile.ZipExtFile with io.TextIOWrapper. This would be fine with me. |
|||
| msg159613 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2012年04月29日 14:54 | |
> This would be fine with me. It may be worth to deprecate PEP 278? Oh, only ten years have passed since 2.3, but it seems it was so long ago. |
|||
| msg159614 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2012年04月29日 15:05 | |
> It may be worth to deprecate PEP 278? Oh, only ten years have passed > since 2.3, but it seems it was so long ago. Well, I don't know if PEPs ever get deprecated. In this case, PEP 3116 is probably the superseder. |
|||
| msg159755 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2012年05月01日 12:41 | |
I know how to remove universal newline support, I know how after this correct these functions (with issue14371 they partially corrected), but I don't know how to deprecate universal newline support. What should be done? Can you initiate a discussion in Python-Ideas or Python-Dev? Now only zipfile trying to implement this PEP (and failed), io ignores 'U' mode, gzip checks and raises. With regard to this issue, I note that there are tests that check that read() returns unmodified data (UniversalNewlineTests.read_test in Lib/test/test_zipfile.py). It looks weird, but apparently, this behavior is expected and deliberate. |
|||
| msg225472 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2014年08月17日 21:09 | |
Mode 'U' is deprecated now (issue15204). |
|||
| msg289171 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2017年03月07日 16:36 | |
Support of the 'U' mode is removed in 3.6 (issue27029). |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:56:52 | admin | set | github: 51008 |
| 2017年03月07日 16:36:02 | serhiy.storchaka | set | status: open -> closed resolution: out of date messages: + msg289171 stage: needs patch -> resolved |
| 2014年08月17日 21:09:09 | serhiy.storchaka | set | messages: + msg225472 |
| 2014年02月03日 19:12:55 | BreamoreBoy | set | nosy:
- BreamoreBoy |
| 2012年05月01日 12:41:49 | serhiy.storchaka | set | messages: + msg159755 |
| 2012年04月29日 15:05:53 | pitrou | set | messages: + msg159614 |
| 2012年04月29日 14:54:42 | serhiy.storchaka | set | messages: + msg159613 |
| 2012年04月29日 13:54:45 | pitrou | set | versions:
+ Python 3.3, - Python 2.6, Python 3.1, Python 2.7, Python 3.2 nosy: + pitrou messages: + msg159608 type: behavior -> enhancement stage: patch review -> needs patch |
| 2012年04月29日 09:41:03 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages: + msg159594 |
| 2010年07月24日 20:26:47 | r.david.murray | set | messages: + msg111504 |
| 2010年07月24日 15:59:58 | BreamoreBoy | set | nosy:
+ BreamoreBoy messages: + msg111491 |
| 2009年08月27日 03:57:28 | agillesp | set | files:
+ issue6759_4.diff messages: + msg91994 |
| 2009年08月26日 23:33:23 | agillesp | set | messages: + msg91990 |
| 2009年08月26日 22:44:00 | agillesp | set | files:
+ issue6759_3.diff messages: + msg91987 |
| 2009年08月26日 18:55:43 | r.david.murray | set | messages: + msg91981 |
| 2009年08月26日 17:32:03 | r.david.murray | set | messages:
+ msg91977 stage: test needed -> patch review |
| 2009年08月26日 17:08:15 | agillesp | set | files:
+ issue6759_2.diff messages: + msg91976 |
| 2009年08月26日 13:05:35 | r.david.murray | set | priority: normal versions: + Python 3.1, Python 2.7, Python 3.2 nosy: + r.david.murray messages: + msg91971 stage: test needed |
| 2009年08月26日 04:44:44 | ryles | set | messages: + msg91962 |
| 2009年08月22日 22:37:41 | agillesp | set | files:
+ issue6759_1.diff nosy: + agillesp messages: + msg91880 keywords: + patch type: behavior |
| 2009年08月22日 06:22:01 | ryles | create | |