Issue 14562: urllib2 maybe blocks too long with small chunks

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/58767

classification

Title:	urllib2 maybe blocks too long with small chunks
Type:	behavior	Stage:	resolved
Components:	Library (Lib)	Versions:	Python 2.7

process

Dependencies:	Superseder:
Status:	closed	Resolution:	out of date
Assigned To:	orsenthil	Nosy List:	Anrs.Hu, Jim.Jewett, ZackerySpytz, hongqn, martin.panter, orsenthil
Priority:	normal	Keywords:

Created on 2012年04月12日 12:23 by Anrs.Hu, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Messages (7)
msg158124 - (view)	Author: Anrs Hu (Anrs.Hu)	Date: 2012年04月12日 12:23
If HTTP URL response's Transfer-Encoding is 'Chunked', then the urllib2.urlopen(URL).readline() will block until there're enough 8192 bytes, even though the first chunk is just a line. Every chunks should be processed as soon as posible, so the readline() behavior should read a line and return immediately, rather than read 8K data to buffer and look up a line from the buffer.
msg158125 - (view)	Author: Senthil Kumaran (orsenthil) * (Python committer)	Date: 2012年04月12日 12:26
I am trying to this test this to determine the fault.
msg158225 - (view)	Author: Jim Jewett (Jim.Jewett) * (Python triager)	Date: 2012年04月13日 19:06
It would be helpful to have a testcase, so that it will stay fixed.
msg158246 - (view)	Author: Anrs Hu (Anrs.Hu)	Date: 2012年04月14日 01:55
Okay, there's a test case of web.py: Server codes are following: import web class index(object): def GET(self): yield 'hello\n' yield 'world\n' time.sleep(60) client is Python interpreter >>> resp = urllib.urlopen(URL) >>> resp.readline() # will be 'hello' >>> resp.readline() # will be 'world' >>> resp.readline() # huh, it's blocked, and we to agree with it. >>> # but to use urllib2 will another behavor. >>> urllib2.urlopen(URL).readline() # huh, it's blocked even if 'hello' and 'world' returned yet. Because urllib2 uses a 8KiB buffer on socket._fileobjece within urllib2.py, it read 8K data to buffer first.
msg165927 - (view)	Author: Senthil Kumaran (orsenthil) * (Python committer)	Date: 2012年07月20日 13:59
I had a discussion with Anrs on this, and it went along these lines - I confused the buffering issue (encountered with streaming data) of urllib2 with chunked transfer encoding. The flow will be blocked in the case at the socket level waiting for 8192 bytes. But this buffer size has been kept for buffered reading purposes of normal read scenarios. However, in case of streaming data, this may be not the best way. Here it is explained best - http://stackoverflow.com/questions/1598331/how-to-read-continous-http-streaming-data-in-python The advise is to make the socket buffer size to 0. import socket socket._fileobject.default_bufsize = 0 Now, if we come to chunked transfer encoding, the chunked transfer encoding will behave as it is advertised, like sending one chunk at the time, but still having the readline limit set by MAXLINE in the httplib.py. For the chunked transfer encoding to be recognized the client will have to get a header "transfer-encoding: chunked" from the server and when it receives that header, it will follow the path reading MAXLINE at the time and then returning. For smaller chunks with a blocking behavior of the server ( like you illustrated), we may still need to adopt to turn off default_bufsize to 0 to ensure quick responses to fill the buffer. At this moment, I think that the above thing could be documented in the urllib2 docs for the issue you had raised. Not sure, if any other approach would be suitable to handle this behavior. Anrs (The original poster) also responded that they way he had to overcome this for a very small chunks is setting the socket file size to 0 locally. >> resp = opener.open(server, urllib.urlencode(data)) >> resp = opener.open(server) >> resp.fp._rbufsize = 0 >> for line in iter(resp.readline, ''): >> yield line I think, this could be documented in a certain fashion (like support for streaming without buffering or transfers for small data sizes without buffering).
msg239099 - (view)	Author: Martin Panter (martin.panter) * (Python committer)	Date: 2015年03月24日 08:52
I can reproduce this with Python 2, but not with current Python 3, nor with v3.3.3. Probably doesn’t affect 3.2 either, but I haven’t tried.
msg370091 - (view)	Author: Zackery Spytz (ZackerySpytz) * (Python triager)	Date: 2020年05月27日 15:10
Python 2 is EOL, so I think this issue should be closed.

History
Date	User	Action	Args
2022年04月11日 14:57:29	admin	set	github: 58767
2020年05月28日 02:27:09	benjamin.peterson	set	status: open -> closed resolution: out of date stage: needs patch -> resolved
2020年05月27日 15:10:15	ZackerySpytz	set	nosy: + ZackerySpytz messages: + msg370091
2016年06月17日 01:52:49	martin.panter	set	versions: - Python 3.2
2015年03月24日 08:52:16	martin.panter	set	nosy: + martin.panter messages: + msg239099 versions: - Python 3.3
2012年07月20日 13:59:41	orsenthil	set	messages: + msg165927 stage: needs patch
2012年04月16日 06:10:57	hongqn	set	nosy: + hongqn
2012年04月14日 01:55:38	Anrs.Hu	set	messages: + msg158246
2012年04月13日 19:07:36	Jim.Jewett	set	title: urllib2 maybe blocks too long -> urllib2 maybe blocks too long with small chunks
2012年04月13日 19:06:40	Jim.Jewett	set	nosy: + Jim.Jewett messages: + msg158225
2012年04月12日 12:26:47	orsenthil	set	versions: + Python 3.2, Python 3.3 nosy: + orsenthil messages: + msg158125 assignee: orsenthil
2012年04月12日 12:23:09	Anrs.Hu	create

homepage