This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2012年02月17日 17:36 by Alex Quinn, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Messages (10) | |||
|---|---|---|---|
| msg153581 - (view) | Author: Alex Quinn (Alex Quinn) | Date: 2012年02月17日 17:36 | |
When accessing this URL, both urllib2 (Py2) and urlib.client (Py3) raise an IncompleteRead error. http://info.kingcounty.gov/health/ehs/foodsafety/inspections/XmlRest.aspx?Zip_Code=98199 Previous discussions about similar errors suggest that this may be due to a problem with the server and chunked data transfer. (See links below.) I can't understand what that means. However, this works fine with urllib (Py2), curl, wget, and all regular web browsers I've tried it with. Thus, I would have expected urllib2 (Py2) and urllib.request (Py3) to cope with it similarly. Versions I've tested with: - Fails with urllib2 + Python 2.5.4, 2.6.1, 2.7.2 (Error messages vary.) - Fails with urllib.request + Python 3.1.2, 3.2.2 - Succeeds with urllib + Python 2.5.4, 2.6.1, 2.7.2 - Succeeds with wget 1.11.1 - Succeeds with curl 7.15.5 ___________________________________________________________ TEST CASES # FAILS - Python 2.7, 2.6, 2.5 import urllib2 url = "http://info.kingcounty.gov/health/ehs/foodsafety/inspections/XmlRest.aspx?Zip_Code=98199" xml_str = urllib2.urlopen(url).read() # Raises httplib.IncompleteRead # FAILS - Python 3.2, 3.1 import urllib.request url = "http://info.kingcounty.gov/health/ehs/foodsafety/inspections/XmlRest.aspx?Zip_Code=98199" xml_str = urllib.request.urlopen(url).read() # Raises http.client.IncompleteRead # SUCCEEDS - Python 2.7, 2.6, 2.5 import urllib url = "http://info.kingcounty.gov/health/ehs/foodsafety/inspections/XmlRest.aspx?Zip_Code=98199" xml_str = urllib.urlopen(url).read() dom = xml.dom.minidom.parseString(xml_str) # Verify XML is complete print("urllib: %d bytes received and parsed successfully"%len(xml_str)) # SUCCEEDS - wget wget -O- "http://info.kingcounty.gov/health/ehs/foodsafety/inspections/XmlRest.aspx?Zip_Code=98199" | wc # SUCCEEDS - curl - prints an error, but returns the full data anyway curl "http://info.kingcounty.gov/health/ehs/foodsafety/inspections/XmlRest.aspx?Zip_Code=98199" | wc ___________________________________________________________ RELATED DISCUSSIONS http://www.gossamer-threads.com/lists/python/python/847985 http://bugs.python.org/issue11463 (closed) http://bugs.python.org/issue6785 (closed) http://bugs.python.org/issue6312 (closed) |
|||
| msg171263 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2012年09月25日 13:02 | |
The example URL doesn't seem to work anymore. Do you have another example to test with? |
|||
| msg191087 - (view) | Author: raylu (raylu) | Date: 2013年06月13日 19:20 | |
The URL works for me. While wget does download it successfully, I get the following output: $ wget http://info.kingcounty.gov/health/ehs/foodsafety/inspections/XmlRest.aspx\?Zip_Code\=98199 --2013年06月13日 12:15:21-- http://info.kingcounty.gov/health/ehs/foodsafety/inspections/XmlRest.aspx?Zip_Code=98199 Resolving info.kingcounty.gov (info.kingcounty.gov)... 146.129.240.75 Connecting to info.kingcounty.gov (info.kingcounty.gov)|146.129.240.75|:80... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/xml] Saving to: ‘XmlRest.aspx?Zip_Code=98199’ [ <=> ] 515,315 448KB/s in 1.1s 2013年06月13日 12:15:23 (448 KB/s) - Read error at byte 515315 (Success).Retrying. --2013年06月13日 12:15:24-- (try: 2) http://info.kingcounty.gov/health/ehs/foodsafety/inspections/XmlRest.aspx?Zip_Code=98199 Connecting to info.kingcounty.gov (info.kingcounty.gov)|146.129.240.75|:80... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/xml] Saving to: ‘XmlRest.aspx?Zip_Code=98199’ [ <=> ] 0 --.-K/s in 0s Cannot write to ‘XmlRest.aspx?Zip_Code=98199’ (Success). Similarly, curl gives $ curl http://info.kingcounty.gov/health/ehs/foodsafety/inspections/XmlRest.aspx\?Zip_Code\=98199 > /dev/null % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 503k 0 503k 0 0 222k 0 --:--:-- 0:00:02 --:--:-- 229k curl: (18) transfer closed with outstanding read data remaining $ wget --version GNU Wget 1.14 built on linux-gnu. $ curl --version curl 7.30.0 (x86_64-pc-linux-gnu) libcurl/7.30.0 OpenSSL/1.0.1e zlib/1.2.8 libidn/1.25 libssh2/1.4.2 librtmp/2.3 |
|||
| msg208169 - (view) | Author: Laurento Frittella (laurento.frittella) | Date: 2014年01月15日 15:24 | |
I had the same problem using urllib2 and the following trick worked for me import httplib httplib.HTTPConnection._http_vsn = 10 httplib.HTTPConnection._http_vsn_str = 'HTTP/1.0' Source: http://stackoverflow.com/a/20645845 |
|||
| msg210813 - (view) | Author: Martin Panter (martin.panter) * (Python committer) | Date: 2014年02月10日 09:07 | |
The server in question is sending a chunked response, but seems to be closing the connection when it is done, without sending a zero-length chunk (which I understand it is meant to according to the HTTP protocol). My Firefox shows the XML without any indication of error. But then if I manually truncate a chunked response to Firefox it doesn’t indicate an error either, which I would probably want to know about. |
|||
| msg231406 - (view) | Author: Martin Panter (martin.panter) * (Python committer) | Date: 2014年11月20日 02:37 | |
I suggest this is the same situation as Issue 6785, and is not a bug in Python. However it might be reasonable to allow forcing a HTTP client connection to version 1.0, which could be used as a workaround. |
|||
| msg231434 - (view) | Author: Laurento Frittella (laurento.frittella) | Date: 2014年11月20日 14:00 | |
Even if forcing the HTTP/1.0 workaround works it can end up in weird issues, especially if used in something more than a small script, like the one I tried to describe in this issue report[1] for the "requests" python library. [1] https://github.com/kennethreitz/requests/issues/2341 |
|||
| msg255396 - (view) | Author: Martin Panter (martin.panter) * (Python committer) | Date: 2015年11月26日 01:12 | |
Closing this as being a bug in the web server, rather than Python. If someone wants to add a way to force a HTTP 1.0 response, or a way to get all valid data before raising the exception, I suggest opening a new report. |
|||
| msg287644 - (view) | Author: CJ Kucera (apocalyptech) * | Date: 2017年02月12日 18:42 | |
I've just encountered this problem on Python 3.6, on a different URL. The difference being that it's not encountered with EVERY page load, though I'd say it happens with at least half:
import urllib.request
html = urllib.request.urlopen('http://www.basicinstructions.net/').read()
print('Succeeded!')
I realize that the root problem here may be an HTTP server doing something improper, but I've got no way of fixing someone else's webserver. It'd be really nice if there was a reasonable way of handling this in Python itself. As mentioned in the original report, other methods of retreiving this URL work without fail (curl/wget/etc). As it is, the only way for me to be sure of retreiving the entire page contents is by looping until I don't get an IncompleteRead, which is hardly ideal.
|
|||
| msg287653 - (view) | Author: CJ Kucera (apocalyptech) * | Date: 2017年02月12日 21:40 | |
Ah, well, actually I suppose I'll rescind that a bit - other pages about this bug around the internet had been claiming that the 'requests' module uses urllib in the backend and was subject to this bug as well, but after experimenting myself, it seems like if that IS the case, they're working around it somehow, because using requests makes this succeed 100% of the time. I probably should've tried that first! So anyway, there's a reasonable workaround, at least. Sorry for the bugspam! |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:26 | admin | set | github: 58252 |
| 2017年02月12日 21:40:30 | apocalyptech | set | messages: + msg287653 |
| 2017年02月12日 18:42:21 | apocalyptech | set | nosy:
+ apocalyptech messages: + msg287644 versions: + Python 3.6 |
| 2015年11月26日 01:12:52 | martin.panter | set | status: open -> closed resolution: third party messages: + msg255396 |
| 2015年02月13日 01:25:29 | demian.brecht | set | nosy:
- demian.brecht |
| 2014年11月20日 14:00:28 | laurento.frittella | set | messages: + msg231434 |
| 2014年11月20日 02:37:37 | martin.panter | set | messages: + msg231406 |
| 2014年07月24日 00:32:11 | demian.brecht | set | nosy:
+ demian.brecht |
| 2014年02月10日 09:07:16 | martin.panter | set | nosy:
+ martin.panter messages: + msg210813 |
| 2014年01月15日 15:28:34 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka type: behavior versions: + Python 3.3, Python 3.4, - Python 2.6, Python 3.1, Python 3.2 |
| 2014年01月15日 15:24:43 | laurento.frittella | set | nosy:
+ laurento.frittella messages: + msg208169 |
| 2013年09月29日 13:57:43 | msornay | set | nosy:
+ msornay |
| 2013年06月13日 19:20:43 | raylu | set | nosy:
+ raylu messages: + msg191087 |
| 2012年09月25日 13:02:00 | pitrou | set | nosy:
+ pitrou messages: + msg171263 |
| 2012年02月18日 01:27:56 | pitrou | set | nosy:
+ orsenthil |
| 2012年02月17日 17:36:16 | Alex Quinn | create | |