homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: urlib.request fails to open some sites
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.2, Python 3.3, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: orsenthil Nosy List: angus, barry, daniel.ugra, davide.rizzo, eric.araujo, ezio.melotti, georg.brandl, nadeem.vawda, nneonneo, orsenthil, python-dev, santoso.wijaya, terry.reedy, vstinner
Priority: normal Keywords: patch

Created on 2011年07月17日 00:07 by daniel.ugra, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
issue12576.patch orsenthil, 2011年07月25日 00:04 review
Messages (17)
msg140512 - (view) Author: Ugra Dániel (daniel.ugra) Date: 2011年07月17日 00:07
Issue #12133 introduced a patch which seems to cause problems.
I'm using Python 3.2.1 on 64-bit Arch Linux (this version already incorporates the changes from #12133).
The following code:
with urllib.request.urlopen(url) as page:
 pass
raises "ValueError: I/O operation on closed file." exception when url is "http://www.imdb.com/".
When I removed "h.close()" (added by the patch) from request.py everything worked as expected.
Other URLs work flawlessly with patched code ("http://www.google.com/" for example).
Maybe it is something to do with differences in HTTP responses or in server-side behavior.
For example IMDb's "Cneonction: close" (not a typo) feature.
But this could be totally unrelated, I am by no means an HTTP expert.
msg140513 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2011年07月17日 01:29
On Sun, Jul 17, 2011 at 12:07:44AM +0000, Ugra Dániel wrote:
> For example IMDb's "Cneonction: close" (not a typo) feature. But
This is a mistake at the server and urllib relies on the
Connection: close header at some point in time in the process.
You could try with few other sites and see that context manager should
work.
msg140569 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年07月18日 09:00
h.close() (HTTPConnection.close) in the finally block of AbstractHTTPHandler.do_open() calls indirectly r.close() (HTTPResponse.close). The problem is that the content of the response cannot be read if its close() method was called.
The changelog of the fix (commit ad6bdfd7dd4b) is: "Issue #12133: AbstractHTTPHandler.do_open() of urllib.request closes the HTTP connection if its getresponse() method fails with a socket error. Patch written by Ezio Melotti."
The HTTP connection is not only closed in case of an error, but it is always closed.
It's a bug because we cannot read the content of www.imdb.com, whereas it works without the commit. Test script:
---------------
import urllib.request, gc
print("python.org")
with urllib.request.urlopen("http://www.python.org/") as page:
 content = page.read()
 print("content: %s..." % content[:40])
gc.collect()
print("imdb.com")
with urllib.request.urlopen("http://www.imdb.com/") as page:
 content = page.read()
 print("content: %s..." % content[:40])
gc.collect()
print("exit")
---------------
msg140570 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年07月18日 09:01
ValueError('I/O operation on closed file') error comes from HTTPResponse.__enter__() which is implemented in IOBase:
 def __enter__(self): # That's a forward reference
 self._checkClosed()
 return self
msg140571 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年07月18日 10:13
imdb.com and python.org use HTTP/1.1. imdb.com server sends a "Transfer-encoding: chunked" header whereas python.org doesn't. python.org has a "Connection: close" header, whereas imdb.com doesn't.
The more revelant difference for this issue is the "Connection: close" header: HTTPResponse.wil_close is True if "Connection: close" header is present (see _check_close() method), it returns False otherwise. HTTPConnection.getresponse() keeps a reference to the response if will_close is False, or calls its close() method otherwise.
The "Cneonction: close" header looks to be a quirk of Netscaler loadbalancers. It is sometimes "nnCoection" uses the same load balancer.
There are buggy web servers, Python should not raise a "I/O closed file" error on such server.
msg140683 - (view) Author: Robert Xiao (nneonneo) * Date: 2011年07月19日 17:29
Seconded. #12133 inadvertently closes the response object if the server fails to indicate "Connection: close". In my case, Amazon S3 (s3.amazonaws.com) causes this problem:
(Python 3.2)
>>> conn = urllib.request.urlopen('http://s3.amazonaws.com/SurveyMonkeyFiles/VPAT_SurveyMonkey.pdf')
>>> len(conn.read())
27692
(Python 3.2.1)
>>> conn = urllib.request.urlopen('http://s3.amazonaws.com/SurveyMonkeyFiles/VPAT_SurveyMonkey.pdf')
>>> len(conn.read())
0
The problem is that S3 doesn't send back a "Connection: close" header, so when h.close() is called from request.py, the request object is also closed; consequently, conn.fp is None and so conn.read() returns an empty bytes object.
This is a clear regression due to the patch in #12133.
msg140763 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2011年07月20日 21:03
I think this is also a regression in Python 2.7, as reported here:
https://bugs.launchpad.net/ubuntu/+source/python2.7/+bug/813295 
msg140913 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011年07月22日 21:43
Could we look for 'tion: Closed' instead of "Connection: Closed", to accomodate servers that garble the response, even if it is a hack?
msg140935 - (view) Author: angus (angus) Date: 2011年07月23日 06:27
I'm experiencing a related problem:
---
from urllib.request import urlopen
print(urlopen('https://mtgox.com/').read())
---
prints b'' rather than the page content.
It looks like mtgox.com always sends 'Connection: Keep-Alive'. So some hack like recognising 'tion: close' wouldn't fix it.
msg140936 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2011年07月23日 06:33
I am against hacks like tion: close. Under worst case, we shall revert
the change which caused this regression in the first place.
msg140937 - (view) Author: Robert Xiao (nneonneo) * Date: 2011年07月23日 06:43
S3 also doesn't send any kind of connection header at all.
x-amz-id-2: WWuo30Fk2inKVcC5dH4GOjvHxnqMa5Q2+AduPm2bMhL1h3GqzOR0EPwUv0biqv2V
x-amz-request-id: 3CCF6B6A000E6446
Date: 2011年7月23日 06:42:45 GMT
x-amz-meta-s3fox-filesize: 27692
x-amz-meta-s3fox-modifiedtime: 1213292340000
Last-Modified: 2008年6月12日 17:45:12 GMT
ETag: "c4db184c97f1d6b0b6e7ee17a73e785b"
Accept-Ranges: bytes
Content-Type: application/pdf
Content-Length: 27692
Server: AmazonS3
msg140947 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2011年07月23日 08:24
Recognizing "ction: close" as "Connection: close" is exactly what those servers do *not* want you to do.
msg141065 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2011年07月25日 00:04
I propose the attached patch as fix to this issue. All it does is, moves the code of getting http response to the finally block of the http request. It closes the sockets if the getting the response fails for some reason, otherwise it proceeds normally. 
Please provide your critique if any, otherwise, I shall go ahead with checking this in.
msg141199 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011年07月27日 00:25
New changeset 9eac48fbe21d by Senthil Kumaran in branch '3.2':
Fix closes Issue12576 - fix urlopen behavior on sites which do not send (or obsfuscates) Connection: Close header.
http://hg.python.org/cpython/rev/9eac48fbe21d
New changeset a45c8ce67c7d by Senthil Kumaran in branch 'default':
merge from 3.2 - Fix closes Issue12576 - fix urlopen behavior on sites which do not send (or obsfuscates) Connection: Close header.
http://hg.python.org/cpython/rev/a45c8ce67c7d 
msg141202 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011年07月27日 01:32
New changeset d58b43fb9208 by Senthil Kumaran in branch '3.2':
Correcting issue 12576 fix, which resulted in buildbot failures.
http://hg.python.org/cpython/rev/d58b43fb9208
New changeset dcfce522723d by Senthil Kumaran in branch 'default':
merge from 3.2 - Correcting issue 12576 fix, which resulted in buildbot failures.
http://hg.python.org/cpython/rev/dcfce522723d 
msg141249 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2011年07月27日 17:34
Re-opening, as I think this needs to still be applied to Python 2.7.
msg141254 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2011年07月27日 17:57
Never mind. This changeset got applied to 2.7 (thanks!) but didn't get linked in the tracker.
changeset: 71523:b66bbbdc7abd
branch: 2.7
parent: 71518:73ae3729b8fe
user: Senthil Kumaran <senthil@uthcode.com>
date: Wed Jul 27 09:37:17 2011 +0800
summary: merge from 3.2 - fix urlopen behavior on sites which do not send (or obsfuscates) Connection: Close header.
History
Date User Action Args
2022年04月11日 14:57:19adminsetgithub: 56785
2011年07月27日 17:57:23barrysetstatus: open -> closed

messages: + msg141254
2011年07月27日 17:34:01barrysetstatus: closed -> open

messages: + msg141249
2011年07月27日 01:32:20python-devsetmessages: + msg141202
2011年07月27日 00:25:57python-devsetstatus: open -> closed

nosy: + python-dev
messages: + msg141199

resolution: fixed
stage: resolved
2011年07月25日 00:04:34orsenthilsetfiles: + issue12576.patch
keywords: + patch
messages: + msg141065
2011年07月24日 05:17:41ezio.melottilinkissue12628 superseder
2011年07月23日 08:24:47georg.brandlsetnosy: + georg.brandl
messages: + msg140947
2011年07月23日 06:43:34nneonneosetmessages: + msg140937
2011年07月23日 06:33:07orsenthilsetmessages: + msg140936
2011年07月23日 06:33:04orsenthilsetassignee: orsenthil
2011年07月23日 06:27:42angussetnosy: + angus
messages: + msg140935
2011年07月22日 21:43:59terry.reedysetnosy: + terry.reedy
messages: + msg140913
2011年07月21日 10:21:51vstinnersetnosy: + ezio.melotti
2011年07月20日 21:03:44barrysetnosy: + barry

messages: + msg140763
versions: + Python 2.7
2011年07月19日 17:29:37nneonneosetnosy: + nneonneo
messages: + msg140683
2011年07月18日 14:20:29eric.araujosetnosy: + eric.araujo
2011年07月18日 10:13:39vstinnersetmessages: + msg140571
2011年07月18日 09:07:13davide.rizzosetnosy: + davide.rizzo
2011年07月18日 09:01:51vstinnersetmessages: + msg140570
2011年07月18日 09:00:39vstinnersetnosy: + vstinner
messages: + msg140569
2011年07月18日 02:20:05santoso.wijayasetnosy: + santoso.wijaya

versions: + Python 3.3
2011年07月17日 06:50:55nadeem.vawdasetnosy: + nadeem.vawda
2011年07月17日 01:29:25orsenthilsetnosy: + orsenthil
messages: + msg140513
2011年07月17日 00:07:43daniel.ugracreate

AltStyle によって変換されたページ (->オリジナル) /