This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2013年06月05日 09:45 by anh.le, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| password_delimiters.patch | madison.may, 2013年06月19日 21:25 | review | ||
| Messages (9) | |||
|---|---|---|---|
| msg190646 - (view) | Author: anh le (anh.le) | Date: 2013年06月05日 09:45 | |
>>> u = 'http://auser:secr#et@192.168.0.1:8080/a/b/c.html' >>> urlparse.urlsplit(u) SplitResult(scheme='http', netloc='auser:secr', path='', query='', fragment='et@192.168.0.1:8080/a/b/c.html') |
|||
| msg190677 - (view) | Author: Madison May (madison.may) * | Date: 2013年06月05日 16:33 | |
urllib.parse.urlsplit() in Python3.3 behaves the same way. Since urlsplit takes an optional param "allow_fragments", I don't think it should be a high priority issue.
The relevant code from Python3.3 is below, however:
if allow_fragments and '#' in url:
url, fragment = url.split('#', 1)
if '?' in url:
url, query = url.split('?', 1)
Note that passwords containing '?' would produce a similar result, which is perhaps mildly more concerning, as there is no flag to ignore the query portion of the url.
That being said, I'm against making any changes to urlsplit at this point, since that would also require modifying urlunsplit and may in fact make it much more difficult (or impossible) to rejoin a url. The strength of the very simple implementation we have currently is that it's always reversible.
|
|||
| msg190690 - (view) | Author: Dmi Baranov (dmi.baranov) * | Date: 2013年06月05日 20:19 | |
anh, nice catch.
Madison, I'm not see any terrible consequences, only urlsplit is affected now:
>>> urlsplit('http://user:password@domain.com:80/path?query#fragment')
SplitResult(scheme='http', netloc='user:password@domain.com:80', path='/path', query='query', fragment='fragment')
>>> urlunsplit(_)
'http://user:password@domain.com:80/path?query#fragment'
>>> urlunsplit(('http', 'user:pass#ord@domain.com:80', '/path', 'query', 'fragment'))
'http://user:pass#ord@domain.com:80/path?query#fragment'
|
|||
| msg190692 - (view) | Author: Madison May (madison.may) * | Date: 2013年06月05日 20:42 | |
My apologies -- that was an oversight on my part. Now that I look at the issue again, it's plain that it most likely won't be an issue. Problems only arise when you allow '#' to occur before '?' and then treat portion of the url between the '#' and '?' as a fragment. In this scenario, when the url is rejoined, the order of the fragment and query is switched by urlunsplit(). However, since the portion of the url with a hashtag would be treated as a the netloc, this isn't any cause for concern. We only need to ensure that the order of the components is maintained. Thanks for the heads up, Dmi. |
|||
| msg191497 - (view) | Author: Madison May (madison.may) * | Date: 2013年06月19日 21:25 | |
Here's a potential patch for the issue, should we decide it's worth fixing. All current tests pass with the update version of _splitnetloc(), and I've added a test to test_urlparse to check that urls with '#' or '?' in the password field are treated properly. |
|||
| msg235586 - (view) | Author: Martin Panter (martin.panter) * (Python committer) | Date: 2015年02月09日 04:24 | |
Possibly related: Issue 23328. I would have thought any special characters in the password etc field should be encoded. Where do such unencoded URLs come from? It seems that this would change the behaviour of parsing a URL such as http://host/path/with/unencoded/@/character Is one of these scenarios more valid than the other? |
|||
| msg334041 - (view) | Author: Martin Panter (martin.panter) * (Python committer) | Date: 2019年01月19日 06:52 | |
Today I read RFC 3986, and I think the URLs in the bug reports are valid, and are already parsed correctly. The path is allowed to have a literal "at" symbol: path-abempty = *( "/" segment ) segment = *pchar pchar = unreserved / pct-encoded / sub-delims / ":" / "@" The query and fragment are allowed to have "at" and question marks: query = *( pchar / "/" / "?" ) fragment = *( pchar / "/" / "?" ) So I think this could be closed because the parsing is working correctly. |
|||
| msg375109 - (view) | Author: David Six (david.six) | Date: 2020年08月10日 13:17 | |
tl;dr: '#', '?' and a few other characters should be URL-encoded/%-encoded when they appear in userinfo which will already parse correctly.
---
Following up on what Martin said, RFC 3986 has the specifications for how these examples should be parsed.
userinfo = *( unreserved / pct-encoded / sub-delims / ":" )
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
pct-encoded = "%" HEXDIG HEXDIG
sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="
Notably, gen-delims are _not_ included in the allowed characters, nor are non-ASCII characters.
gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"
These and other characters not mentioned should be URL-encoded/%-encoded if they appear in the password.
Taking the first example:
>>> from urllib.parse import urlparse
>>> u = 'http://auser:secr%23et@192.168.0.1:8080/a/b/c.html'
>>> urlparse(u)
ParseResult(scheme='http', netloc='auser:secr%23et@192.168.0.1:8080', path='/a/b/c.html', params='', query='', fragment='')
>>> unquote(urlparse(u).password)
'secr#et'
|
|||
| msg383658 - (view) | Author: Senthil Kumaran (orsenthil) * (Python committer) | Date: 2020年12月23日 19:55 | |
Not a bug. The message #msg375109 explains how to quote and unquote the '#' in the password field, and demonstrates how urllib parses it correctly. I guess, it was set to open as a mistake. Closing it again. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:46 | admin | set | github: 62340 |
| 2020年12月23日 19:55:46 | orsenthil | set | status: open -> closed messages: + msg383658 stage: resolved |
| 2020年12月19日 02:38:15 | orsenthil | set | versions: + Python 3.10, - Python 2.7, Python 3.5, Python 3.6 |
| 2020年08月10日 13:17:32 | david.six | set | status: pending -> open nosy: + david.six messages: + msg375109 |
| 2019年01月19日 06:52:28 | martin.panter | set | status: open -> pending resolution: not a bug messages: + msg334041 |
| 2017年06月17日 01:57:01 | martin.panter | set | dependencies: + [security] urllib connects to a wrong host |
| 2016年03月16日 21:23:17 | martin.panter | link | issue26572 superseder |
| 2016年03月16日 21:20:03 | martin.panter | set | title: urlparse.urlsplit confused to fragment when password include # -> urlparse, urlsplit confused when password includes fragment (#), query (?) versions: + Python 3.5, Python 3.6 |
| 2015年02月09日 04:24:32 | martin.panter | set | nosy:
+ martin.panter messages: + msg235586 |
| 2013年06月19日 21:25:09 | madison.may | set | files:
+ password_delimiters.patch keywords: + patch messages: + msg191497 |
| 2013年06月05日 20:42:30 | madison.may | set | messages: + msg190692 |
| 2013年06月05日 20:19:08 | dmi.baranov | set | nosy:
+ dmi.baranov messages: + msg190690 |
| 2013年06月05日 16:33:45 | madison.may | set | nosy:
+ madison.may messages: + msg190677 |
| 2013年06月05日 11:57:36 | orsenthil | set | assignee: orsenthil nosy: + orsenthil |
| 2013年06月05日 09:45:03 | anh.le | create | |