This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2012年06月05日 22:28 by Buck.Golemon, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| parse.py | ankitoshniwal, 2012年06月07日 23:18 | |||
| Messages (7) | |||
|---|---|---|---|
| msg162378 - (view) | Author: Buck Golemon (Buck.Golemon) | Date: 2012年06月05日 22:28 | |
1) As long as x is valid, I expect that urlunsplit(urlsplit(x)) == x
2) yelp:///foo is a well-formed (albeit odd) url. It it similar to file:///tmp: it specifies the /foo resource, on the "current" host, using the yelp protocol (defined on mobile devices).
>>> from urlparse import urlsplit, urlunsplit
>>> urlunsplit(urlsplit('yelp:///foo'))
'yelp:/foo'
Urlparse / unparse has the same bug:
>>> urlunparse(urlparse('yelp:///foo'))
'yelp:/foo'
The file: protocol seems to be special-case, in an inappropriate manner:
>>> urlunsplit(urlsplit('file:///tmp'))
'file:///tmp'
|
|||
| msg162507 - (view) | Author: Ankit Toshniwal (ankitoshniwal) | Date: 2012年06月07日 23:18 | |
Hello,
Did some initial investigation, so looks like as per the code in parse.py, under the function urlunsplit, we take the 5-tuple returned by urlsplit . In the case of foo we get:
SplitResult(scheme='yelp', netloc='', path='/foo', query='', fragment='')
Now this tuple is passed to urlunsplit. We have a if statement under the urlunsplit function
if netloc or (scheme and scheme in uses_netloc and url[:2] != '//'):
which checks if the netloc exists in the url (in our case it does not) then we check if the scheme in the url is part of the uses_netloc list (predefined list in parse.py with the list of common types of schemes used like http, ftp, file, rsync etc). In our case since yelp is not part of it we fail at the if statement and then we just return the url instead of modifying it. What we need was that if the above statement fails we do an else which does something like this:
if netloc or (scheme and scheme in uses_netloc and url[:2] != '//'):
if url and url[:1] != '/':
url = '/' + url
url = '//' + (netloc or '') + url
else:
if url and url[:1] != '/':
url = '/' + url
url = '//' + (netloc or '') + url
In that case we get the right url back.
After changing the code here is what i get on local dev machines:
>>> urlunparse(urlparse('yelp:///foo'))
'yelp:///foo'
>>> urlunsplit(urlsplit('file:///tmp'))
'file:///tmp'
>>> urlunsplit(urlsplit('yelp:///foo'))
'yelp:///foo'
Thanks,
Ankit.
P.S : I am new to python trying to learn it and also work on small projects let me know what you think if this is the right approach.
|
|||
| msg162509 - (view) | Author: Buck Golemon (Buck.Golemon) | Date: 2012年06月07日 23:55 | |
Well i think the real issue is that you can't enumerate the protocals that "use netloc". All protocols are allowed to have a netloc. the smb: protocol certainly does, but it's not in the list.
The core issue is that smb:/foo and smb:///foo are different urls, and should be represented differently when split. The /// form has a netloc, it's just the empty-string. The single-slash form has no netloc, so I propose that urlsplit('smb:/foo') return SplitResult(scheme='smb', netloc=None, path='/foo', query='', fragment='')
|
|||
| msg164320 - (view) | Author: Senthil Kumaran (orsenthil) * (Python committer) | Date: 2012年06月29日 09:36 | |
Let me address this one thing at a time, the point on smb really confused me and I got into thinking that how smb (being more common), the issue was not raised. Looks smb url will always start with smb:// (// are the requirements for identified netloc, empty or otherwise) and cases for smb are fine - http://tools.ietf.org/html/draft-crhertel-smb-url-00 That said, the dependency on uses_netloc has come many times and I am still looking for way to remove the dependency without affecting the previous parsing behaviors and ofcourse tests. |
|||
| msg164322 - (view) | Author: Senthil Kumaran (orsenthil) * (Python committer) | Date: 2012年06月29日 11:07 | |
Look at the following two bugs which dwelt on similar issues: Issue8339 and Issue7904 and in one message particular, msg102737, I seem to have come to a conclusion that " I don't see that 'x://' and 'x:///y' qualifies as valid URLS as per RFC 3986" and it applies to this bug too where the url is requested as yelp:///x Does yelp://localhost/x be a way to access in your case? That would be consistent with specification. Or in your code, you can add 'yelp' to uses_netloc list and then expect the desired behavior. from urlparse import uses_netloc uses_netloc.append('yelp') I understand that, using of the uses_netloc is a limitation, but given the requirements of both absolute and relative parsing that lists has served a useful behavior. I would like to close this one for the above mention points and open a feature request (or convert this to a feature request) which asks to remove the dependency of uses_netloc in urlparse. Does this resolution sound okay? |
|||
| msg164712 - (view) | Author: Buck Evan (bukzor) * | Date: 2012年07月06日 02:44 | |
Let's examine x:// absolute-URI = scheme ":" hier-part [ "?" query ] hier-part = "//" authority path-abempty So this is okay if authority and path-abempty can both be empty strings. authority = [ userinfo "@" ] host [ ":" port ] host = IP-literal / IPv4address / reg-name reg-name = *( unreserved / pct-encoded / sub-delims ) path-abempty = *( "/" segment ) Yep. And the same applies for x:///y, except that path-abempty matches /y instead of nothing. This means these are in fact valid urls per RFC3986, counter to your claim. |
|||
| msg235585 - (view) | Author: Martin Panter (martin.panter) * (Python committer) | Date: 2015年02月09日 04:13 | |
Fixing Issue 22852 or Issue 5843 should help fixing this. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:31 | admin | set | github: 59214 |
| 2015年05月31日 04:28:09 | martin.panter | set | status: open -> closed superseder: urllib.parse wrongly strips empty #fragment, ?query, //netloc resolution: duplicate stage: needs patch -> resolved |
| 2015年02月09日 04:13:13 | martin.panter | set | messages: + msg235585 |
| 2013年11月24日 03:28:14 | martin.panter | set | nosy:
+ martin.panter |
| 2012年07月06日 02:44:36 | bukzor | set | nosy:
+ bukzor messages: + msg164712 |
| 2012年06月29日 11:07:32 | orsenthil | set | messages: + msg164322 |
| 2012年06月29日 09:36:43 | orsenthil | set | messages: + msg164320 |
| 2012年06月15日 08:07:34 | ezio.melotti | set | nosy:
+ ezio.melotti stage: needs patch type: behavior versions: + Python 3.3, - Python 2.6 |
| 2012年06月08日 03:27:20 | orsenthil | set | assignee: orsenthil nosy: + orsenthil |
| 2012年06月07日 23:55:12 | Buck.Golemon | set | messages: + msg162509 |
| 2012年06月07日 23:18:35 | ankitoshniwal | set | files:
+ parse.py nosy: + ankitoshniwal messages: + msg162507 |
| 2012年06月05日 22:28:27 | Buck.Golemon | create | |