homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: httplib fails with HEAD requests to pages with "transfer-encoding: chunked"
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.1, Python 3.2, Python 2.7, Python 2.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: orsenthil Nosy List: Arfrever, chkneo, djc, ezio.melotti, mykhal, orsenthil, rcoup
Priority: Keywords: patch

Created on 2009年06月19日 13:53 by ezio.melotti, last changed 2022年04月11日 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
6312.diff chkneo, 2009年06月29日 17:03 patch for Lib/http/client.py
Messages (11)
msg89521 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2009年06月19日 13:53
Try this code (youtube.com uses "transfer-encoding: chunked"):
import httplib
url = 'www.youtube.com'
conn = httplib.HTTPConnection(url)
conn.request('HEAD', '/') # send an HEAD request
res = conn.getresponse()
print res.getheader('transfer-encoding')
so far it works fine, but when you try:
res.read()
it just hung there, where "there" is:
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "C:\Programs\Python26\lib\httplib.py", line 517, in read
 return self._read_chunked(amt)
 File "C:\Programs\Python26\lib\httplib.py", line 553, in _read_chunked
 line = self.fp.readline()
 File "C:\Programs\Python26\lib\socket.py", line 395, in readline
 data = recv(1)
KeyboardInterrupt
If instead of youtube.com we replace the url with the one of a site that
doesn't use "transfer-encoding: chunked" (e.g. url = 'dpaste.com'),
res.read() returns an empty string.
When an HEAD request is sent, the content of the page is not returned,
so there should be no point in calling .read(), but try this:
import urllib2
class HeadRequest(urllib2.Request):
 def get_method(self):
 return 'HEAD'
url = 'http://www.youtube.com/watch?v=tCVqx2b-c7U'
# Note: I had this problem with this URL, the video 
# is not available in my country (Finland) and it
# may work fine for other countries
req = HeadRequest(url)
page = urllib2.urlopen(req)
This is what happens here with Python 2.5:
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "/usr/lib/python2.5/urllib2.py", line 124, in urlopen
 return _opener.open(url, data)
 File "/usr/lib/python2.5/urllib2.py", line 387, in open
 response = meth(req, response)
 File "/usr/lib/python2.5/urllib2.py", line 498, in http_response
 'http', request, response, code, msg, hdrs)
 File "/usr/lib/python2.5/urllib2.py", line 419, in error
 result = self._call_chain(*args)
 File "/usr/lib/python2.5/urllib2.py", line 360, in _call_chain
 result = func(*args)
 File "/usr/lib/python2.5/urllib2.py", line 579, in http_error_302
 fp.read()
 File "/usr/lib/python2.5/socket.py", line 291, in read
 data = self._sock.recv(recv_size)
 File "/usr/lib/python2.5/httplib.py", line 509, in read
 return self._read_chunked(amt)
 File "/usr/lib/python2.5/httplib.py", line 548, in _read_chunked
 chunk_left = int(line, 16)
ValueError: invalid literal for int() with base 16: ''
With Python 2.6 the error is slightly different:
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "C:\Programs\Python26\lib\urllib2.py", line 124, in urlopen
 return _opener.open(url, data, timeout)
 File "C:\Programs\Python26\lib\urllib2.py", line 389, in open
 response = meth(req, response)
 File "C:\Programs\Python26\lib\urllib2.py", line 502, in http_response
 'http', request, response, code, msg, hdrs)
 File "C:\Programs\Python26\lib\urllib2.py", line 421, in error
 result = self._call_chain(*args)
 File "C:\Programs\Python26\lib\urllib2.py", line 361, in _call_chain
 result = func(*args)
 File "C:\Programs\Python26\lib\urllib2.py", line 594, in http_error_302
 fp.read()
 File "C:\Programs\Python26\lib\socket.py", line 327, in read
 data = self._sock.recv(rbufsize)
 File "C:\Programs\Python26\lib\httplib.py", line 517, in read
 return self._read_chunked(amt)
 File "C:\Programs\Python26\lib\httplib.py", line 563, in _read_chunked
 raise IncompleteRead(value)
httplib.IncompleteRead
With Py3.0 it is the same:
[...]
http.client.IncompleteRead: b''
In this case self.fp.readline() (and the data = recv(1) in socket.py)
returns and the error happens a few lines later.
This seems to happen when there's a redirection in between (the video is
not available in my country, the server sends back a 303 status code,
and redirects me to the home page). The redirection is not handled by
httplib so there might be something wrong in urllib2 too (why it's
trying to read the content if we sent and HEAD request and if there is a
redirection in between?), but fixing httplib to return an empty string
or something similar could be enough to solve this problem too. If
there's actually a problem another issue should probably be created.
With the same code and the url of a working youtube video (no
redirections in between), "page = urllib2.urlopen(req)" works even if
there's the "transfer-encoding: chunked" but it fails later if we do
"page.read()": 
Traceback (most recent call last):
 File "C:\Programs\Python30\lib\http\client.py", line 520, in _read_chunked
 chunk_left = int(line, 16)
ValueError: invalid literal for int() with base 16: ''
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "C:\Programs\Python30\lib\http\client.py", line 479, in read
 return self._read_chunked(amt)
 File "C:\Programs\Python30\lib\http\client.py", line 525, in _read_chunked
 raise IncompleteRead(value)
http.client.IncompleteRead: b''
msg89868 - (view) Author: Chandru (chkneo) Date: 2009年06月29日 17:03
HEAD request wont return any data. So before calling _read_chunked we
have to check the amt is none or not.If its none simply return b''
I've attached the patch too which is take in py3k branch
msg99796 - (view) Author: Michal Božoň (mykhal) Date: 2010年02月22日 17:52
i confirm..
in my case, the bug manifestated when calling HEAD method on a different server with chunked transfer encoding (http://obrazky.cz)
my workaround is to call response.read() always, except from cases when method == 'HEAD' and resp.getheader('transfer-encoding') == 'chunked
msg104404 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010年04月28日 03:38
I can take this up. The HEAD requests does not contain any data, so when the data is None and transfer encoding is chunked, we can return empty value for the next step. No need of attempting to read the chuncked amt. The patch is fine and tests need to be added.
msg104443 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010年04月28日 17:48
Whenever the HEAD method is queried, the httplib recognizes it read method and returns an '' empty string as expected.
Fixed in revision 80583, release26-maint: r80584, py3k: r80587 and release31-maint in 80588.
msg106457 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2010年05月25日 18:07
Thanks Senthil!
msg106520 - (view) Author: Dirkjan Ochtman (djc) * (Python committer) Date: 2010年05月26日 10:40
The fix in r80583 is bad. It fails to close() the response (which previously worked as expected), meaning that the connection can't be re-used.
(I ran into this because Gentoo has backported the 2.6-maint fixes to their 2.6.5 distribution.)
Shall I open a new issue, or re-open this one?
msg106521 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010年05月26日 11:12
I am just reopening this, as per dcj's comment.
msg107076 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010年06月04日 16:46
Fixed in r81687, r81688, r81689 and r81690.
Yes, I see that before the original change was made any chuncked encoding went through _read_chunked which close the resp before returning. So, here for HEAD, the resp is closed thus fixing the problem mentioned by djc.
msg107077 - (view) Author: Dirkjan Ochtman (djc) * (Python committer) Date: 2010年06月04日 17:06
Might be useful to have a test for this?
msg107080 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010年06月04日 17:33
I saw the earlier tests was closing it explicitly. Removed that and added a test which verifies the closed resp obj. Thanks.
History
Date User Action Args
2022年04月11日 14:56:50adminsetgithub: 50561
2010年06月04日 17:33:10orsenthilsetmessages: + msg107080
2010年06月04日 17:06:46djcsetmessages: + msg107077
2010年06月04日 16:46:07orsenthilsetstatus: open -> closed
priority: release blocker ->
resolution: accepted -> fixed
messages: + msg107076
2010年06月04日 14:42:02djcsetpriority: normal -> release blocker
2010年05月26日 13:58:59Arfreversetnosy: + Arfrever
2010年05月26日 11:12:47orsenthilsetstatus: closed -> open
resolution: fixed -> accepted
messages: + msg106521
2010年05月26日 10:40:36djcsetnosy: + djc
messages: + msg106520
2010年05月25日 18:07:35ezio.melottisetmessages: + msg106457
versions: + Python 3.1, Python 3.2, - Python 2.5, Python 3.0
2010年04月28日 17:48:21orsenthilsetstatus: open -> closed
resolution: accepted -> fixed
messages: + msg104443

stage: patch review -> resolved
2010年04月28日 03:38:07orsenthilsetnosy: + orsenthil
messages: + msg104404

assignee: orsenthil
resolution: accepted
2010年04月28日 03:23:46rcoupsetnosy: + rcoup
2010年02月22日 17:52:44mykhalsetnosy: + mykhal
messages: + msg99796
2009年06月30日 23:06:00ezio.melottisetstage: patch review
2009年06月29日 17:03:26chkneosetfiles: + 6312.diff

nosy: + chkneo
messages: + msg89868

keywords: + patch
2009年06月19日 13:53:28ezio.melotticreate

AltStyle によって変換されたページ (->オリジナル) /