Issue 27089: I think this is a small bug in urlparse.py

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/71276

classification

Title:	I think this is a small bug in urlparse.py
Type:	behavior	Stage:	resolved
Components:	Library (Lib)	Versions:	Python 3.4, Python 2.7

process

Dependencies:	Superseder:
Status:	closed	Resolution:	not a bug
Assigned To:	Nosy List:	Feng A, berker.peksag, xiang.zhang, ztane
Priority:	normal	Keywords:

Created on 2016年05月23日 08:06 by Feng A, last changed 2022年04月11日 14:58 by admin. This issue is now closed.

Messages (7)
msg266125 - (view)	Author: Feng A (Feng A)	Date: 2016年05月23日 08:06
===================================================== BUG: run : urlparse.urlparse('http://google.com]') then: raise ValueError("Invalid IPv6 URL") ===================================================== SOURCE: if url[:2] == '//': netloc, url = _splitnetloc(url, 2) if (('[' in netloc and ']' not in netloc) or (']' in netloc and '[' not in netloc)): raise ValueError("Invalid IPv6 URL") ===================================================== SOLUTION: I THINK IT IS BETTER TO JUST REMOVE THE LAST 3 LINES ABOVE
msg266126 - (view)	Author: Xiang Zhang (xiang.zhang) * (Python committer)	Date: 2016年05月23日 08:17
Why? I think this is the right behaviour. According to the rfc[1], square brackets are used and only used to refer IPv6 address in URI. Square brackets are reserved characters and the URI you give is not correct. 1. http://tools.ietf.org/html/rfc3986#section-3
msg266127 - (view)	Author: Feng A (Feng A)	Date: 2016年05月23日 08:51
I wish you could think twice if you hadn't use urlparse.py in practical project. 1. Do you like the module to raise an exception? 2. The href in webpage is always standard format? 3. Should the parse module verify the ipv6 url format? If so, did the module really make it? 4. Personally, Given a wrong formated url, It is the responsibility of the module to correct it ?
msg266128 - (view)	Author: Xiang Zhang (xiang.zhang) * (Python committer)	Date: 2016年05月23日 09:03
As a general purpose library for url parsing, I think conforming to the existing standard is a good choice. 'http://google.com]' is a malformed URI according to the standard and then I think raising an exception is quite suitable. Of course there are always malformed links in webpages but how to correct them is quite objective. I think catch the exception in application and correct them in your own logic is what you should do.
msg266129 - (view)	Author: Xiang Zhang (xiang.zhang) * (Python committer)	Date: 2016年05月23日 09:04
Ohh, not objective, but subjective. Sorry.
msg266130 - (view)	Author: Antti Haapala (ztane) *	Date: 2016年05月23日 09:20
This behaviour exists exactly because the return value also contains the `.hostname`, which for the IPv6 addresses is without brackets: >>> urlparse('http://[::1]:80/').hostname '::1' There is no way to get a proper parsing result from such a broken URI.
msg266131 - (view)	Author: Berker Peksag (berker.peksag) * (Python committer)	Date: 2016年05月23日 09:30
> 4. Personally, Given a wrong formated url, It is the responsibility of the module to correct it ? It's not the responsibility of the library to correct (or make a guess on) user input.

History
Date	User	Action	Args
2022年04月11日 14:58:31	admin	set	github: 71276
2016年05月23日 09:30:40	berker.peksag	set	status: open -> closed nosy: + berker.peksag messages: + msg266131 resolution: not a bug stage: resolved
2016年05月23日 09:20:01	ztane	set	nosy: + ztane messages: + msg266130
2016年05月23日 09:04:27	xiang.zhang	set	messages: + msg266129
2016年05月23日 09:03:35	xiang.zhang	set	messages: + msg266128
2016年05月23日 08:51:45	Feng A	set	messages: + msg266127
2016年05月23日 08:17:52	xiang.zhang	set	nosy: + xiang.zhang messages: + msg266126
2016年05月23日 08:06:30	Feng A	create

homepage