homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: RFC2732 support for urlparse (IPv6 addresses)
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.2, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: orsenthil Nosy List: Keegan.Carruthers-Smith, anacrolix, benjamin.peterson, eric.araujo, jjlee, ndim, orsenthil, pitrou, r.david.murray, sergiomb2, tlocke
Priority: normal Keywords: easy, patch

Created on 2008年05月27日 20:40 by ndim, last changed 2022年04月11日 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
python-urlparse-rfc2732-fix.patch ndim, 2008年05月27日 20:40 preliminary urlparse fix, requires more thought
python-urlparse-rfc2732-rfc-list.patch ndim, 2008年05月27日 20:41 update RFC list on top of urlparse.py
python-urlparse-rfc2732-test.patch ndim, 2008年05月27日 20:42 test cases with RFC2732 urls to parse
parse.py.patch tlocke, 2010年04月11日 23:09 Patch to parse.py and test_urlparse.py
urlparse-module-header.diff orsenthil, 2010年04月12日 06:42
issue2987-final.patch orsenthil, 2010年04月15日 15:58
issue2987-bad_url_checks.diff orsenthil, 2010年04月17日 18:14
Messages (25)
msg67430 - (view) Author: Hans Ulrich Niedermann (ndim) Date: 2008年05月27日 20:40
The urlparse module's ways of splitting the location into hostname and
port breaks with RFC2732 style URIs with IPv6 addresses in them:
>>> import urlparse
>>> urlparse.urlparse('http://[::1]:80/').hostname
'['
>>> urlparse.urlparse('http://[::1]:80/').port
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "/usr/lib/python2.5/urlparse.py", line 116, in port
 return int(port, 10)
ValueError: invalid literal for int() with base 10: ':1]:80'
>>> 
A simple fix is attached, but probably requires a little more thought.
msg67431 - (view) Author: Hans Ulrich Niedermann (ndim) Date: 2008年05月27日 20:47
I have written this patch because urlparse could not retrieve the
hostname or port components of URIs such as
http://[::ffff:192.168.13.37]/ or http://[dead:beef::1]:8888/
This problem happens with Python 2.5.1 in Fedora 9, and I have also
found it in Python's SVN trunk/ and release25-maint/ source code.
It still needs some polishing and thinking: See the places marked
FIXME, but probably also others. One would not want an inconsistent
API feel with respect to IPv6 address handling.
Might require some more comprehensive thought about how Python wants
to handle networks other-than-IPv4, exceeding the scope of a simple
patch to urlparse.py.
On a not-totally-unrelated note, someone should examine whether IRIs[1]
can fit into urlparse.py, or whether they need e.g. a separate
iriparse.py with an adapted API.
[1] RFC 3987 - Internationalized Resource Identifiers (IRIs)
 M. Duerst, M. Suignard, January 2005
msg98314 - (view) Author: Sérgio (sergiomb2) Date: 2010年01月26日 04:17
Hi, with python-2.6.2-2.fc12.i686
In: x ="http://www.somesite.com/images/rubricas/"
In: urlparse.urljoin(x, '07.11.2009-9:54:12-1.jpg')
Out: '07.11.2009-9:54:12-1.jpg' !?
In: urlparse.urljoin(x, './07.11.2009-9:54:12-1.jpg')
Out: 'http://www.somesite.com/images/rubricas/07.11.2009-9:54:12-1.jpg'
urlparse.urlparse('07.11.2009-9:54:12-1.jpg')
is wrong
but 
urlparse.urlparse('./07.11.2009-9:54:12-1.jpg')
isn't. 
think about that please
msg98315 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010年01月26日 04:22
okay, this should be easy to address. But the more important part is RFC compliance so that this simple change does not break many other things in the wild.
msg102874 - (view) Author: Tony Locke (tlocke) Date: 2010年04月11日 19:32
I've created a patch for parse.py against the py3k branch, and I've also included ndim's test cases in that patch file.
When returning the host name of an IPv6 literal, I don't include the surrounding '[' and ']'. For example, parsing http://[::1]:5432/foo/ gives the host name '::1'.
msg102876 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010年04月11日 19:38
Seems sensible: Delimiters are not part of components.
msg102881 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010年04月11日 20:32
I think parsing should be a bit more careful. For example, what happens when you give 'http://dead:beef::]/foo/' as input (note the missing opening bracket)?
msg102882 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010年04月11日 20:34
By the way, updating the RFC list as done in python-urlparse-rfc2732-rfc-list.patch is also a good idea.
msg102884 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010年04月11日 20:44
Isn’t "http://dead:beef::]/foo/" and invalid URI?
Regarding doc, see also #5650.
msg102886 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010年04月11日 20:54
> Isn’t "http://dead:beef::]/foo/" and invalid URI?
That's the point, it shouldn't parse as a valid one IMO.
msg102911 - (view) Author: Tony Locke (tlocke) Date: 2010年04月11日 23:09
Regarding the RFC list issue, I've posted a new patch with a new RFC list that combines ndim's list and the comments from #5650.
Pitrou argues that http://dead:beef::]/foo/ should fail because it's a malformed URL. My response would be that the parse() function has historically assumed that a URL is well formed, and so this change to accommodate IPv6 should continue to assume the URL is well formed.
I'd say that a separate bug should be raised if it's thought that parse() should be changed to check that any URL is well-formed.
msg102915 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010年04月12日 05:10
With respect to msg98314 (http://bugs.python.org/msg98314) referenced in this bug, which I thought is easy to handle, does not appear so. It is bit tricky.
The problem is the relative url is given of the format '07.11.2009-9:54:12-1.jpg' and urlparse wrongly assumes that it is VALID url with the scheme as 07.11.2009-9 ( Surprisingly, this falls under valid characters for a URL Scheme, but we know that there no url scheme like that).
But when you give ./07.11.2009-9, ./ is identified a relative path and urljoin happens properly. 
My inclination for this specific msg9814, is the allow the user to give the proper path like ./07.11.2009-9 or use urljoin from different directory, images/07.11.2009-9 and this should handle it.
This date-time relative url is not a typical scenario, but for typical scnerios, urlparse behaves as expected.
>>> x = 'http://a.b.c'
>>> urlparse.urljoin(x,'foo')
'http://a.b.c/foo'
>>> urlparse.urljoin(x,'./foo')
'http://a.b.c/foo'
>>> 
I shall provide my comments on the IPv6 parse in next msg.
msg102920 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010年04月12日 06:42
After spending a sufficient amount of time looking at patches and the RFC 2732, I tend to agree with the patch provided by tlocke. It does cover the behavior for parsing IPv6 URL with '[' hostname ']'. RFC 2732 is very short and just says that hostname in the IPv6 should not have '[' and ']' characters. The patch does just that, which is fine.
If hard pressed on detecting invalid IPv6 , I would add and extra 
+ if "[" in netloc and "]" in netloc:
+ return netloc.split("]")[0][1:].lower()
+ elif "[" in netloc or "]" in netloc:
+ raise ValueError("Invalid IPv6 URL")
Which should take care of Invalid IPv6 urls as discussed in this bug.
- Any comments on this?
Also regarding the urlparse header docs, (it was long pending on me and sorry), here is a patch for current one for review. When we address this bug, I shall include RFC 2732 as well in the list.
msg103065 - (view) Author: Keegan Carruthers-Smith (Keegan.Carruthers-Smith) Date: 2010年04月13日 17:11
Just thought I'd point out that RFC2732 was obsoleted by RFC3986 http://www.rfc-editor.org/rfc/rfc3986.txt 
msg103066 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010年04月13日 17:18
Hello
Thanks for the precision. This particular topic is discussed on #5650, feel free to help there!
Better update the code before the doc, though.
Regards
msg103067 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010年04月13日 17:24
Actually, this bug is just for parsing the IPv6 url. We are having the
right set of patches in the bug. I shall commit it soon.
The RFC part is separate and we will slowly achieve a good compliance
with STD 66.
msg103226 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010年04月15日 15:58
Final patch with inclusion of detecting invalid urls at netloc and hostname level, tests and NEWS entry.
msg103255 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2010年04月15日 20:59
This is ok with me.
msg103285 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010年04月16日 02:47
Committed into trunk in revision 80101 
msg103288 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010年04月16日 03:07
merged into py3k in revision 80102 and release31-maint in revision 80103.
Thanks for the patches, Tony and Hans. I have acknowledged it in NEWS file too.
msg103312 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010年04月16日 11:29
Reverted the check-in made to 3.1 maint (in r80104). Features should not go in there.
msg103410 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010年04月17日 16:47
I posted this to the checkins list, but for reference, the following invalid URL should be added to the test cases:
 http://[::1/foo/bar]/bad
msg103419 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010年04月17日 18:14
Moving the Bad URL check to a higher level can be detect the bad urls much better. Once I the netloc is parsed and obtained, invalid URL can be checked. I am attaching an update with the new test included.
If you have any comments, please let me know.
msg103430 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010年04月17日 20:44
I don't know how deep you want to get into detecting invalid URIs, but with the new patch this one causes a parsing error that is probably worth dealing with:
 http://abc[xyz]jkl
Maybe a reasonable set of checks would be (in hostname) that if the part of the netloc after the @ contains a ']' or a '[', then it must start with a [ and either end with a ] or contain a ']:'.
I can also mess up your new checks with something like this:
 http://foo[bar@baz]
or even:
 http://foo[bar@baz:33]
although those don't fail, they just faithfully produce the nonsensical results implicit in the invalid urls. I think the above check logic in hostname would catch them, but it wouldn't catch this one:
 http://foo[bar@[bar]:33]
That may be OK, though, since as you noted earlier we aren't doing full URI validation.
Oh, and I notice that your test only covers the 'fast' path code, it doesn't exercise the general URI logic.
(Sorry I didn't review this issue earlier.)
msg103753 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010年04月20日 20:51
I added an additional invalid test which David pointed out and made changes to invalid url checking code. I moved it more higher level.
- The reason for doing this is, invalid url test code (which is very specific for '[' enclosed ']' ipv6 url is concentrated at a single place). We can deal with parsing separately from check.
Now, other forms of Invalid URLs are possible as David points out (and possibly more too), but leaving it is better as it would unnecessarily add syntax-checks at various different places (instead of a single place), without much of value add. Dealing with Valid URLs and a parse logic checking should be fine.
commits: trunk - r80277 and py3k - r80278 
History
Date User Action Args
2022年04月11日 14:56:35adminsetgithub: 47236
2011年01月20日 09:49:49anacrolixsetnosy: + anacrolix
2010年04月20日 20:51:37orsenthilsetmessages: + msg103753
2010年04月17日 20:44:51r.david.murraysetmessages: + msg103430
2010年04月17日 18:14:03orsenthilsetfiles: + issue2987-bad_url_checks.diff

messages: + msg103419
2010年04月17日 16:47:44r.david.murraysetnosy: + r.david.murray
messages: + msg103410
2010年04月16日 11:29:31orsenthilsetmessages: + msg103312
2010年04月16日 03:07:58orsenthilsetstatus: open -> closed
resolution: accepted -> fixed
messages: + msg103288

stage: patch review -> resolved
2010年04月16日 02:47:31orsenthilsetmessages: + msg103285
2010年04月15日 20:59:23benjamin.petersonsetmessages: + msg103255
2010年04月15日 15:58:30orsenthilsetfiles: + issue2987-final.patch

messages: + msg103226
2010年04月13日 17:24:08orsenthilsetmessages: + msg103067
2010年04月13日 17:18:17eric.araujosetmessages: + msg103066
2010年04月13日 17:11:41Keegan.Carruthers-Smithsetnosy: + Keegan.Carruthers-Smith
messages: + msg103065
2010年04月12日 06:42:07orsenthilsetfiles: + urlparse-module-header.diff
resolution: accepted
messages: + msg102920
2010年04月12日 05:11:00orsenthilsetmessages: + msg102915
2010年04月11日 23:15:29pitrousetnosy: + benjamin.peterson
2010年04月11日 23:09:10tlockesetfiles: + parse.py.patch

messages: + msg102911
2010年04月11日 22:50:37tlockesetfiles: - parse.py.patch
2010年04月11日 20:54:23pitrousetmessages: + msg102886
title: RFC2732 support for urlparse (e.g. http:// -> RFC2732 support for urlparse (IPv6 addresses)
2010年04月11日 20:44:48eric.araujosetmessages: + msg102884
title: RFC2732 support for urlparse (e.g. http://[::1]:80/) -> RFC2732 support for urlparse (e.g. http://
2010年04月11日 20:34:20pitrousetmessages: + msg102882
2010年04月11日 20:32:46pitrousetnosy: + pitrou
messages: + msg102881
2010年04月11日 19:38:39eric.araujosetnosy: + eric.araujo
messages: + msg102876
2010年04月11日 19:32:49tlockesetfiles: + parse.py.patch
versions: + Python 3.2
nosy: + tlocke

messages: + msg102874
2010年01月26日 04:22:37orsenthilsetassignee: orsenthil

messages: + msg98315
nosy: + orsenthil
2010年01月26日 04:17:30sergiomb2setnosy: + sergiomb2
messages: + msg98314
2009年04月22日 17:24:41ajaksu2setpriority: normal
keywords: + easy
2009年02月13日 01:44:17ajaksu2setnosy: + jjlee
stage: patch review
versions: + Python 2.7, - Python 2.6, Python 2.5
2008年05月27日 20:47:33ndimsetmessages: + msg67431
2008年05月27日 20:42:08ndimsetfiles: + python-urlparse-rfc2732-test.patch
2008年05月27日 20:41:32ndimsetfiles: + python-urlparse-rfc2732-rfc-list.patch
2008年05月27日 20:40:36ndimcreate

AltStyle によって変換されたページ (->オリジナル) /