This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2003年06月13日 15:15 by kruegi, last changed 2022年04月10日 16:09 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| issue754016-py26.patch | orsenthil, 2008年06月27日 18:20 | |||
| 754016.patch | dstanek, 2010年08月03日 23:58 | |||
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 661 | merged | Tim.Graham, 2017年03月13日 17:39 | |
| PR 16837 | merged | miss-islington, 2019年10月18日 13:07 | |
| PR 16839 | merged | orsenthil, 2019年10月18日 13:51 | |
| Messages (19) | |||
|---|---|---|---|
| msg16377 - (view) | Author: Thomas Krüger (kruegi) | Date: 2003年06月13日 15:15 | |
urlparse doesnt work if IP and port are given without
scheme:
>>> urlparse.urlparse('1.2.3.4:80','http')
('1.2.3.4', '', '80', '', '', '')
should be:
>>> urlparse.urlparse('1.2.3.4:80','http')
('http', '1.2.3.4', '80', '', '', '')
|
|||
| msg16378 - (view) | Author: Shannon Jones (sjones) | Date: 2003年06月14日 02:26 | |
Logged In: YES user_id=589306 urlparse.urlparse takes a url of the format: <scheme>://<netloc>/<path>;<params>?<query>#<fragment> And returns a 6-tuple of the format: (scheme, netloc, path, params, query, fragment). An example from the library refrence takes: |
|||
| msg16379 - (view) | Author: Shannon Jones (sjones) | Date: 2003年06月14日 02:39 | |
Logged In: YES
user_id=589306
Sorry, previous comment got cut off...
urlparse.urlparse takes a url of the format:
<scheme>://<netloc>/<path>;<params>?<query>#<fragment>
And returns a 6-tuple of the format:
(scheme, netloc, path, params, query, fragment).
An example from the library refrence takes:
urlparse('http://www.cwi.nl:80/%7Eguido/Python.html')
And produces:
('http', 'www.cwi.nl:80', '/%7Eguido/Python.html', '',
'', '')
--------------------------------
Note that there isn't a field for the port number in the
6-tuple. Instead, it is included in the netloc. Urlparse
should handle your example as:
>>> urlparse.urlparse('1.2.3.4:80','http')
('http', '1.2.3.4:80', '', '', '', '')
Instead, it gives the incorrect output as you indicated.
|
|||
| msg16380 - (view) | Author: Shannon Jones (sjones) | Date: 2003年06月14日 04:18 | |
Logged In: YES user_id=589306 Ok, I researched this a bit, and the situation isn't as simple as it first appears. The RFC that urlparse tries to follow is at http://www.faqs.org/rfcs/rfc1808.html and notice specifically sections 2.1 and 2.2. It seems to me that the source code follows rfc1808 religiously, and in that sense it does the correct thing. According to the RFC, the netloc should begin with a '//', and since your example didn't include one then it technical was an invalid URL. Here is another example where it seems Python fails to do the right thing: >>> urlparse.urlparse('python.org') ('', '', 'python.org', '', '', '') >>> urlparse.urlparse('python.org', 'http') ('http', '', 'python.org', '', '', '') Note that it is putting 'python.org' as the path and not the netloc. So the problem isn't limited to just when you use a scheme parameter and/or a port number. Now if we put '//' at the beginning, we get: >>> urlparse.urlparse('//python.org') ('', 'python.org', '', '', '', '') >>> urlparse.urlparse('//python.org', 'http') ('http', 'python.org', '', '', '', '') So here it does the correct thing. There are two problems though. First, it is common for browsers and other software to just take a URL without a scheme and '://' and assume it is http for the user. While the URL is technically not correct, it is still common usage. Also, urlparse does take a scheme parameter. It seems as though this parameter should be used with a scheme-less URL to give it a default one like web browsers do. So somebody needs to make a decision. Should urlparse follow the RFC religiously and require '//' in front of netlocs? If so, I think the documentation should give an example showing this and showing how to use the 'scheme' parameter. Or should urlparse use the more commonly used form of a URL where '//' is omitted when the scheme is omitted? If so, urlparse.py will need to be changed. Or maybe another fuction should be added to cover whichever behaviour urlparse doesn't cover. In any case, you can temporarily solve your problem by making sure that URL's without a scheme have '//' at the front. So your example becomes: >>> urlparse.urlparse('//1.2.3.4:80', 'http') ('http', '1.2.3.4:80', '', '', '', '') |
|||
| msg16381 - (view) | Author: Facundo Batista (facundobatista) * (Python committer) | Date: 2004年12月26日 21:55 | |
Logged In: YES user_id=752496 The problem is still present in Py2.3.4. IMO, it should support dirs without the "http://" or raise an error if it's not present (never fail silently!). |
|||
| msg16382 - (view) | Author: Georg Brandl (georg.brandl) * (Python committer) | Date: 2005年08月25日 13:22 | |
Logged In: YES user_id=1188172 Will look into it. Should be easy to fix. |
|||
| msg67890 - (view) | Author: Senthil Kumaran (orsenthil) * (Python committer) | Date: 2008年06月10日 02:54 | |
Attaching the patch to fix this issue. I deliberated upon this for a
while and came up with the approach to:
1) fix the port issue, wherein urlparse should technically recognize the
':' separator for port from ':' after scheme.
2) And Doc fix wherein, it is advised that in the absence of a scheme,
use the net_loc as //net_loc (following RCF 1808).
If we go for any other fix, like internally pre-pending // when user has
not specified the scheme (like in many pratical purpose), then we stand
at chance of breaking a number of tests ( cases where url is 'g'(path
only),';x' (path with params) and cases where relative url is g:h)
Let me know your thoughts on this.
>>> urlparse('1.2.3.4:80')
ParseResult(scheme='', netloc='', path='1.2.3.4:80', params='',
query='', fragment='')
>>> urlparse('http://www.python.org:80/~guido/foo?query#fun')
ParseResult(scheme='http', netloc='www.python.org:80',
path='/~guido/foo', params='', query='query', fragment='fun')
>>>
|
|||
| msg68526 - (view) | Author: Facundo Batista (facundobatista) * (Python committer) | Date: 2008年06月21日 18:03 | |
Senthil, your patch is wrong, see:
>>> import urlparse
>>> urlparse.urlparse('1.2.3.4:80','http')
ParseResult(scheme='http', netloc='', path='1.2.3.4:80', params='',
query='', fragment='')
The netloc should be "1.2.3.4:80", note the composition of an URL:
<scheme>://<netloc>/<path>;<params>?<query>#<fragment>
Please fix it and test it applying the patch to the test I'm submitting
here...
|
|||
| msg68533 - (view) | Author: Anthony Lenton (elachuni) * | Date: 2008年06月21日 18:45 | |
I agree with facundobatista that the patch is bad, but for a different
reason: it now breaks with:
>>> import urlparse
>>> urlparse.urlparse ('http:')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/anthony/svn/python26/Lib/urlparse.py", line 108, in urlparse
tuple = urlsplit(url, scheme, allow_fragments)
File "/home/anthony/svn/python26/Lib/urlparse.py", line 148, in urlsplit
if i > 0 and not url[i+1].isdigit():
IndexError: string index out of range
I'm afraid that it it's not evident that the expected behavior isn't
evident.
Take for example:
>>> import urlparse
>>> urlparse.urlparse('some.com', 'http')
ParseResult(scheme='http', netloc='', path='some.com', params='',
query='', fragment='')
Is the url referring to the some.com domain or to a windows executable file?
If you're using urlparse to parse only absolute urls then probably you
want the first component to be considered a net_loc, but not if you're
thinking of accepting also relative urls.
It would probably be better to be explicit and raise an exception if the
url is invalid, so that the user can prepend a '//' and resubmit if
needed. Also we'd probably stop seeing bugreports about this issue :)
|
|||
| msg68535 - (view) | Author: Facundo Batista (facundobatista) * (Python committer) | Date: 2008年06月21日 19:10 | |
I agree with Anthony here, because if you let people write without the
"//" at the beginning, you'll never know if they're entering a net
location or a relative path.
So, the better behaviour to be as explicit as possible should be:
>>> urlparse.urlparse('1.2.3.4:80','http')
Traceback!!! ValueError(<nice message here>)
>>> urlparse.urlparse('//1.2.3.4:80','http')
('http', '1.2.3.4:80', '', '', '', '')
So, to close this issue, we should fix the code to behave like indicated
in the first case.
What do you think?
|
|||
| msg68841 - (view) | Author: Senthil Kumaran (orsenthil) * (Python committer) | Date: 2008年06月27日 18:20 | |
I am attaching the modified patch, which addresses the port issue properly and handles 'http:', 'https:' only URLS. Also included the tests for it. Facundo, I gave sufficient thought on raising an Exception for URLS not staring with '//', and I am -1 for it. As urlparse module is used for handling both absolute URLs as well as relative URLS, this suggestion IMHO, would break the urlparse handling of all relative urls. For e.g, cases which are mentioned in the RFC 1808 (Section 5.1 Normal Examples). The way to inform the users to use '//net_loc' when they want net_loc, would be Docs/Help message (included in the patch) and otherwise urlparse following RFC1808, will treat it as the path. This case may seem absurd when 'www.python.org' is treated as path but perfect for parsing relative urls like just 'a'. More over this makes sense when we have relative urls with parameters and query, for e.g.'g:h','?x' Another way to handle this would be split urlparse into two methods: urlparse.absparse() urlparse.relparse() and let the user decide what he wants to do. I am passing a message to Web-SIG to discuss this further. Irrespective of this, if the patch looks okay for "handling the port issue" for 2.6 with the "Doc/Help message", then we should close this bug and take the discussion further in Web-SIG. (I shall provide the patch for 3.0 as well) Comments Please. |
|||
| msg69095 - (view) | Author: Facundo Batista (facundobatista) * (Python committer) | Date: 2008年07月02日 14:40 | |
I think this last patch is ok, but the third case that was raised in the
web-sig should be addressed:
"""
There's even a 3rd case: HTTP's Request-URI. For example, '//path' must
be treated as an abs_path consisting of two path_segments ['', 'path'],
not a net_loc, since the Request_URI must be one of ("*" | absoluteURI |
abs_path | authority).
"""
Please, address this new detail, and I'd commit this. Thanks!
|
|||
| msg85877 - (view) | Author: Senthil Kumaran (orsenthil) * (Python committer) | Date: 2009年04月11日 21:35 | |
Facundo, I re-looked at this issue (after a long time; sorry for that) and I think that the patch is good to be applied as it is for this issue. The Web-SIG discussion, which you pointed to in the comment (http://mail.python.org/pipermail/web-sig/2008-June/003458.html) suggests a more "general case" of thinking and parsing of the urls. The suggestion is specifically for "//path" kind of urls, which is will be parsed into authority (as per RFC2396) or into netloc as in the patch. My suggestion is to fix this issue with patch and if the corner case like "//path" comes up, then I shall look the modifications required as there is No RFC Specifications and Tests defining those cases. Your comments? |
|||
| msg110334 - (view) | Author: Mark Lawrence (BreamoreBoy) * | Date: 2010年07月14日 22:15 | |
The patch will need to be reworked for the 2.7, 3.1 and 3.2 branches. |
|||
| msg112725 - (view) | Author: David Stanek (dstanek) | Date: 2010年08月03日 23:58 | |
I've reworked the patch so that it applied against the py3k branch. It's been attached to this issue and is also available here: http://codereview.appspot.com/1910044. |
|||
| msg112758 - (view) | Author: Senthil Kumaran (orsenthil) * (Python committer) | Date: 2010年08月04日 04:56 | |
Fixed in revision 83700 (release27-maint). r83701(py3k) and r83702(release31-maint). David, thanks for reworking on the patch. Couple of comments - I made change to the original patch where I checked 'https:' and 'http:' kind of url with url.endswith(':') instead of len(url) == i+1 // I had a hard time to figure out why I had this way in the first place. - In py3k, the urlparse.urlparse is changed to urllib.parse.urlparse. So that changes were required in the tests. |
|||
| msg354892 - (view) | Author: Senthil Kumaran (orsenthil) * (Python committer) | Date: 2019年10月18日 13:07 | |
New changeset 5a88d50ff013a64fbdb25b877c87644a9034c969 by Senthil Kumaran (Tim Graham) in branch 'master': bpo-27657: Fix urlparse() with numeric paths (#661) https://github.com/python/cpython/commit/5a88d50ff013a64fbdb25b877c87644a9034c969 |
|||
| msg354897 - (view) | Author: miss-islington (miss-islington) | Date: 2019年10月18日 13:24 | |
New changeset 82b5f6b16e051f8a2ac6e87ba86b082fa1c4a77f by Miss Islington (bot) in branch '3.7': bpo-27657: Fix urlparse() with numeric paths (GH-661) https://github.com/python/cpython/commit/82b5f6b16e051f8a2ac6e87ba86b082fa1c4a77f |
|||
| msg354906 - (view) | Author: Senthil Kumaran (orsenthil) * (Python committer) | Date: 2019年10月18日 15:23 | |
New changeset 0f3187c1ce3b3ace60f6c1691dfa3d4e744f0384 by Senthil Kumaran in branch '3.8': [3.8] bpo-27657: Fix urlparse() with numeric paths (GH-661) (#16839) https://github.com/python/cpython/commit/0f3187c1ce3b3ace60f6c1691dfa3d4e744f0384 |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月10日 16:09:11 | admin | set | github: 38644 |
| 2019年10月18日 15:23:21 | orsenthil | set | messages: + msg354906 |
| 2019年10月18日 13:51:49 | orsenthil | set | pull_requests: + pull_request16389 |
| 2019年10月18日 13:24:32 | miss-islington | set | nosy:
+ miss-islington messages: + msg354897 |
| 2019年10月18日 13:07:37 | miss-islington | set | pull_requests: + pull_request16383 |
| 2019年10月18日 13:07:37 | orsenthil | set | messages: + msg354892 |
| 2017年03月13日 19:39:47 | BreamoreBoy | set | nosy:
- BreamoreBoy |
| 2017年03月13日 17:39:32 | Tim.Graham | set | pull_requests: + pull_request544 |
| 2010年08月04日 04:56:30 | orsenthil | set | status: open -> closed messages: + msg112758 assignee: facundobatista -> orsenthil resolution: accepted -> fixed stage: patch review -> resolved |
| 2010年08月03日 23:58:04 | dstanek | set | files:
+ 754016.patch nosy: + dstanek messages: + msg112725 |
| 2010年07月14日 22:15:59 | BreamoreBoy | set | nosy:
+ BreamoreBoy messages: + msg110334 versions: + Python 3.1, Python 2.7, Python 3.2, - Python 2.6 |
| 2009年04月11日 21:35:45 | orsenthil | set | resolution: accepted messages: + msg85877 |
| 2009年02月13日 01:37:56 | ajaksu2 | set | nosy:
+ jjlee stage: patch review type: behavior versions: + Python 2.6, - Python 2.3 |
| 2008年07月02日 14:40:56 | facundobatista | set | messages: + msg69095 |
| 2008年06月27日 18:21:20 | orsenthil | set | files: - issue754016.patch |
| 2008年06月27日 18:20:58 | orsenthil | set | files:
+ issue754016-py26.patch messages: + msg68841 |
| 2008年06月21日 19:10:08 | facundobatista | set | messages: + msg68535 |
| 2008年06月21日 19:02:54 | facundobatista | set | files: - test_urlparse.diff |
| 2008年06月21日 18:45:44 | elachuni | set | nosy:
+ elachuni messages: + msg68533 |
| 2008年06月21日 18:03:19 | facundobatista | set | files:
+ test_urlparse.diff messages: + msg68526 |
| 2008年06月20日 01:03:37 | facundobatista | set | assignee: georg.brandl -> facundobatista |
| 2008年06月10日 02:54:28 | orsenthil | set | files:
+ issue754016.patch nosy: + orsenthil messages: + msg67890 keywords: + patch |
| 2003年06月13日 15:15:34 | kruegi | create | |