Issue 8260: When I use codecs.open(...) and f.readline() follow up by f.read() return bad result

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/52507

classification

Title:	When I use codecs.open(...) and f.readline() follow up by f.read() return bad result
Type:	behavior	Stage:	resolved
Components:	Library (Lib)	Versions:	Python 3.3, Python 3.4, Python 2.7

process

Dependencies:	Superseder:
Status:	closed	Resolution:	fixed
Assigned To:	serhiy.storchaka	Nosy List:	ajaksu2, amaury.forgeotdarc, eric.araujo, harobed, lemburg, ncoghlan, python-dev, r.david.murray, serhiy.storchaka, vstinner
Priority:	normal	Keywords:	patch

Created on 2010年03月29日 15:09 by harobed, last changed 2022年04月11日 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
codecs_read.patch	amaury.forgeotdarc, 2010年03月31日 10:00
codecs_read-2.patch	amaury.forgeotdarc, 2010年03月31日 11:11
codecs_read-3.patch	serhiy.storchaka, 2014年01月10日 19:40	review

Messages (15)
msg101892 - (view)	Author: harobed (harobed)	Date: 2010年03月29日 15:09
This is an example, last assert return an error : f = open('data.txt', 'w') f.write("""line 1 line 2 line 3 line 4 line 5 line 6 line 7 line 8 line 9 line 10 line 11 """) f.close() f = open('data.txt', 'r') assert f.readline() == 'line 1\n' assert f.read() == """line 2 line 3 line 4 line 5 line 6 line 7 line 8 line 9 line 10 line 11 """ f.close() import codecs f = codecs.open('data.txt', 'r', 'utf8') assert f.read() == """line 1 line 2 line 3 line 4 line 5 line 6 line 7 line 8 line 9 line 10 line 11 """ f.close() f = codecs.open('data.txt', 'r', 'utf8') assert f.readline() == 'line 1\n' # this assert return a ERROR assert f.read() == """line 2 line 3 line 4 line 5 line 6 line 7 line 8 line 9 line 10 line 11 """ f.close() Regards, Stephane
msg101980 - (view)	Author: Daniel Diniz (ajaksu2) * (Python triager)	Date: 2010年03月31日 06:12
Hi Stephane, I think you're seeing different buffering behavior, which I suspect is correct according to docs. codecs.open should default to line buffering[1], while open uses the system default[2]. The read() where the assert fails is returning the remaining buffer from the readline (which read 72 chars). Asserting e.g. "f.read(1024) == ..." will give you the expected result. [1] http://docs.python.org/library/codecs.html#codecs.open [2] http://docs.python.org/library/functions.html#open
msg101987 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer)	Date: 2010年03月31日 10:00
Buffering applies when writing, not when reading a file. There is indeed a problem in codecs.py: after a readline(), read() will return the content of the internal buffer, and not more. The "size" parameter is a hint, and should not be used to decide whether the character buffer is enough to satisfy the read() request. Patch is attached, with test.
msg101988 - (view)	Author: Marc-Andre Lemburg (lemburg) * (Python committer)	Date: 2010年03月31日 10:28
Amaury Forgeot d'Arc wrote: > > Amaury Forgeot d'Arc <amauryfa@gmail.com> added the comment: > > Buffering applies when writing, not when reading a file. > > There is indeed a problem in codecs.py: after a readline(), read() will return the content of the internal buffer, and not more. > > The "size" parameter is a hint, and should not be used to decide whether the character buffer is enough to satisfy the read() request. > Patch is attached, with test. Agreed. The patch looks good except the if-line should read: if chars >= 0 and len(self.charbuffer) >= chars: ... Thanks, -- Marc-Andre Lemburg eGenix.com ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/
msg101990 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer)	Date: 2010年03月31日 11:11
Updated patch. [I also tried to avoid reading the underlying file if len(self.bytebuffer)>=size, but it does not work with multibytes chars when size=1]
msg122823 - (view)	Author: Éric Araujo (eric.araujo) * (Python committer)	Date: 2010年11月29日 16:17
I applied the diff to test_codecs in py3k, removed the u prefixes and ran: failure. I applied the fix and the test passed.
msg138265 - (view)	Author: harobed (harobed)	Date: 2011年06月13日 17:55
Up, I think this patch isn't applied in Python 3.3a0.
msg138273 - (view)	Author: R. David Murray (r.david.murray) * (Python committer)	Date: 2011年06月13日 19:43
According to this ticket it hasn't been applied anywhere yet (a message will be posted here when it is).
msg139465 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2011年06月30日 08:08
See also #12446.
msg177119 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2012年12月07日 19:52
I think the patch is wrong or is not optimal for case when chars is -1, but size is not. If we want to read all data in any case, then we should call self.stream.read() without argument if chars < 0 or size < 0. If we want to read no more than size bytes, then all loop code should be totally rewritten. Perhaps I am wrong.
msg177123 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2012年12月07日 20:04
As showed in issue12446, issue14475 and issue16636 there are different methods to reproduce this bug (read(size, chars) + readlines(), readline() + readlines()). All this cases should be tested.
msg207875 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2014年01月10日 19:40
Here is revised patch. * Behavior is changed less. read() is less greedy and uses characters from the buffer when read() is called with only one argument (size). It is now a little closer to io stream's read() than with previous patch. * Added tests for cases of issue12446 and issue16636. * Fixed read() for for the TransformCodecTest.test_read test added in 3.4. Actually the uu_codec and zlib_codec are broken.
msg209330 - (view)	Author: Alyssa Coghlan (ncoghlan) * (Python committer)	Date: 2014年01月26日 15:40
Patch looks good to me, but if any specific features are needed to work around misbehaving codecs (as per issue 20132), a comment in the appropriate place referencing that issue would be helpful. And if that workaround means we can remove the special casing from the test_readlines test for the binary transform, cool :)
msg209335 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2014年01月26日 16:23
Actually this patch doesn't work around misbehaving codecs. It just makes specific tests (one readline, one read) be passed. More complex tests which use multiple readline's or read's still can fail with these misbehaving codecs.
msg209337 - (view)	Author: Roundup Robot (python-dev) (Python triager)	Date: 2014年01月26日 17:30
New changeset e24265eb2271 by Serhiy Storchaka in branch '2.7': Issue #8260: The read(), readline() and readlines() methods of http://hg.python.org/cpython/rev/e24265eb2271 New changeset 9c96c266896e by Serhiy Storchaka in branch '3.3': Issue #8260: The read(), readline() and readlines() methods of http://hg.python.org/cpython/rev/9c96c266896e New changeset b72508a785de by Serhiy Storchaka in branch 'default': Issue #8260: The read(), readline() and readlines() methods of http://hg.python.org/cpython/rev/b72508a785de

History
Date	User	Action	Args
2022年04月11日 14:56:59	admin	set	github: 52507
2014年01月26日 17:34:32	serhiy.storchaka	set	status: open -> closed resolution: fixed stage: patch review -> resolved
2014年01月26日 17:30:36	python-dev	set	nosy: + python-dev messages: + msg209337
2014年01月26日 16:23:24	serhiy.storchaka	set	messages: + msg209335
2014年01月26日 15:40:05	ncoghlan	set	messages: + msg209330
2014年01月26日 10:23:08	serhiy.storchaka	set	assignee: serhiy.storchaka
2014年01月21日 20:34:58	serhiy.storchaka	set	nosy: + ncoghlan
2014年01月10日 19:40:38	serhiy.storchaka	set	files: + codecs_read-3.patch messages: + msg207875 versions: - Python 3.2
2012年12月07日 20:04:06	serhiy.storchaka	set	messages: + msg177123
2012年12月07日 20:03:53	serhiy.storchaka	link	issue16636 superseder
2012年12月07日 20:03:38	serhiy.storchaka	link	issue14475 superseder
2012年12月07日 20:03:22	serhiy.storchaka	link	issue12446 superseder
2012年12月07日 19:52:34	serhiy.storchaka	set	nosy: + serhiy.storchaka messages: + msg177119 versions: + Python 3.4
2011年06月30日 08:08:06	vstinner	set	messages: + msg139465
2011年06月13日 19:43:50	r.david.murray	set	nosy: + r.david.murray messages: + msg138273 versions: + Python 3.3, - Python 3.1
2011年06月13日 19:31:20	vstinner	set	nosy: + vstinner
2011年06月13日 17:55:55	harobed	set	messages: + msg138265
2010年11月29日 16:17:29	eric.araujo	set	nosy: + eric.araujo title: When I use codecs.open(...) and f.readline() follow up by f.read() return bad result -> When I use codecs.open(...) and f.readline() follow up by f.read() return bad result messages: + msg122823 versions: + Python 3.1, Python 2.7, Python 3.2, - Python 2.6
2010年03月31日 11:11:07	amaury.forgeotdarc	set	files: + codecs_read-2.patch messages: + msg101990
2010年03月31日 10:28:20	lemburg	set	nosy: + lemburg title: When I use codecs.open(...) and f.readline() follow up by f.read() return bad result -> When I use codecs.open(...) and f.readline() follow up by f.read() return bad result messages: + msg101988
2010年03月31日 10:00:19	amaury.forgeotdarc	set	files: + codecs_read.patch nosy: + amaury.forgeotdarc messages: + msg101987 keywords: + patch stage: test needed -> patch review
2010年03月31日 06:12:21	ajaksu2	set	priority: normal nosy: + ajaksu2 messages: + msg101980 stage: test needed
2010年03月29日 15:09:53	harobed	create

homepage