homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: urllib2 doesn't always supply / where URI path component is empty
Type: behavior Stage: resolved
Components: Versions: Python 3.1, Python 3.2, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: 2464 Superseder:
Assigned To: orsenthil Nosy List: dstanek, flox, jjlee, orsenthil, weschow
Priority: normal Keywords: easy, patch

Created on 2008年12月02日 20:46 by jjlee, last changed 2022年04月11日 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
empty-path-4493.patch weschow, 2010年11月20日 22:21
Messages (5)
msg76777 - (view) Author: John J Lee (jjlee) Date: 2008年12月02日 20:46
As required by RFC 2616 section 3.2.2, for all HTTP requests sent by
urllib2, the path component of the URI should be normalized to "/"
before the Request-URI derived from it gets passed to httplib (or
something functionally equivalent to that). This was fixed in one case
in #2464, but the fix is in the wrong place, since it's a general
problem not specific to redirects. See the longer discussion here:
http://bugs.python.org/msg76736
(hmm, let's see if I can just say msg76736 and get a hyperlink)
Example:
import urllib2
urllib2.urlopen("http://python.org?spam")
Expect: sends "/?spam" in request line.
Got: sends "?spam" in request line.
Probably should be fixed by making Request.get_selector() return the
normalized URI reference (with the slash always present). When fixing,
remember that the Request-URI of RFC 2616 (returned by .get_selector())
is sometimes a relative reference, and sometimes a URI (in RFC 3986's
terminology).
msg121797 - (view) Author: Wes Chow (weschow) Date: 2010年11月20日 22:21
Attached is a patch against 3.2 that replaces empty paths with '/' in HTTPConnection. I do not totally understand the ; syntax in URIs, and so this implementation may break that, as it splits urls and unsplits them if needed. The Python docs seem to indicate there might be some obscure cases where this is problematic.
And yes, I do realize that this patch fixes the problem in yet another place. Hopefully HTTPConnection is the lowest common denominator.
msg122094 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010年11月22日 05:06
Fixed it in r86676 (py3k), r86677 ( release31-maint) and r86678(release27-maint).
Wes: I fixed it at the much higher level in the urlparse itself, so that the fixed url is sent to the httplib.
In issue2464, John had pointed out that according to STD 66, path component can legally be empty, so when it is empty this adding of '/' does not take place.
Also added tests and NEWS.
msg122121 - (view) Author: Wes Chow (weschow) Date: 2010年11月22日 13:18
This same bug also exists in HTTPClient, and my patch addresses that. Addressing it in HTTPClient has a side effect of taking care of it for urllib2 as well (and all future libraries that use HTTPClient).
Even if the urllib2 patch is preferable, shouldn't we fix the problem in HTTPClient as well?
msg124185 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010年12月17日 05:32
Wes, I forgot to address your last comment. 
HTTPClient follows HTTP Spec for requests and responses. When it is used, the request is on the PATH and the code there checks if the path does not exist does a request on '/'. It is not appropriate to pass Invalid URLS to httpclient the Invalid url handling and corrections to that are handled at the much higher level. That's why I made those changes in urllib.
History
Date User Action Args
2022年04月11日 14:56:42adminsetgithub: 48743
2010年12月17日 05:32:46orsenthilsetnosy: jjlee, orsenthil, dstanek, flox, weschow
messages: + msg124185
2010年11月22日 13:18:42weschowsetmessages: + msg122121
2010年11月22日 05:06:55orsenthilsetstatus: open -> closed
resolution: fixed
messages: + msg122094

stage: test needed -> resolved
2010年11月20日 22:21:38weschowsetfiles: + empty-path-4493.patch

nosy: + weschow
messages: + msg121797

keywords: + patch
2010年08月04日 07:49:23floxsetnosy: + flox
2010年08月01日 19:05:27dstaneksetnosy: + dstanek
2010年07月11日 05:37:05orsenthilsetassignee: orsenthil
2010年07月10日 16:55:02BreamoreBoysetversions: + Python 3.1, Python 2.7, Python 3.2, - Python 2.6
2009年04月22日 18:47:49ajaksu2setpriority: normal
keywords: + easy
2009年02月12日 19:14:38ajaksu2setnosy: + orsenthil
dependencies: + urllib2 can't handle http://www.wikispaces.com
type: behavior
stage: test needed
versions: + Python 2.6
2008年12月02日 20:46:11jjleecreate

AltStyle によって変換されたページ (->オリジナル) /