This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2020年06月10日 11:18 by op368, last changed 2022年04月11日 14:59 by admin.
| Messages (7) | |||
|---|---|---|---|
| msg371179 - (view) | Author: Open Close (op368) * | Date: 2020年06月10日 11:18 | |
path 'g' in 'http:g' becomes '/g'.
>>> urlsplit('http:g')
SplitResult(scheme='http', netloc='', path='g', query='', fragment='')
>>> urlunsplit(urlsplit('http:g'))
'http:///g'
>>> urlsplit('http:///g')
SplitResult(scheme='http', netloc='', path='/g', query='', fragment='')
>>> urljoin('http://a/b/c/d', 'http:g')
'http://a/b/c/g'
>>> urljoin('http://a/b/c/d', 'http:///g')
'http://a/g'
The problematic part of the code is:
def urlunsplit(components):
[...]
if netloc or (scheme and scheme in uses_netloc and url[:2] != '//'):
---> if url and url[:1] != '/': url = '/' + url
url = '//' + (netloc or '') + url
Note also that urllib has decided on the interpretation of 'http:g' (in test).
def test_RFC3986(self):
[...]
#self.checkJoin(RFC3986_BASE, 'http:g','http:g') # strict parser
self.checkJoin(RFC3986_BASE, 'http:g','http://a/b/c/g') #relaxed parser
|
|||
| msg393544 - (view) | Author: Jonathan Schweder (jaswdr) * | Date: 2021年05月12日 19:09 | |
@op368 I don't think that this is a bug, [1] literally uses this exact example and shows the expected behaviour. [1] https://datatracker.ietf.org/doc/html/rfc3986#section-5.4.2 |
|||
| msg393574 - (view) | Author: Open Close (op368) * | Date: 2021年05月13日 11:57 | |
hello, @jaswdr, but I can't understand what's wrong with my point. What is 'the expected behaviour'? |
|||
| msg393576 - (view) | Author: Jonathan Schweder (jaswdr) * | Date: 2021年05月13日 12:37 | |
@op368 as far as I can see, regarding of any miss interpretation, yes, the RFC has this section: "http:g" = "http:g" ; for strict parsers / "http://a/b/c/g" ; for backward compatibility What I can understand is that for "http:g" it will be translated to "http:///g" because of backward compatibility, this seems to be an edge case for the parser, since the RFC text also mention that this should be avoided. |
|||
| msg393577 - (view) | Author: Open Close (op368) * | Date: 2021年05月13日 13:08 | |
'http:///g' has absolute path '/g',
and as urljoin shows:
>>> urljoin('http://a/b/c/d', 'http:///g')
'http://a/g' # 'a' is netloc
So you are proposing third interpretation.
"http:g" = "http:g" ; for strict parsers
/ "http://a/b/c/g" ; for backward compatibility
/ "http://a/g" ; (yours)
|
|||
| msg393578 - (view) | Author: Jonathan Schweder (jaswdr) * | Date: 2021年05月13日 13:10 | |
Not exactly, in the RFC example they use a/b/c for the path, but when using http:g there is no nested path, so it should be http:///g, no? |
|||
| msg393583 - (view) | Author: Open Close (op368) * | Date: 2021年05月13日 14:11 | |
I tried hard (even read RFC1630), but I think no. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:59:32 | admin | set | github: 85110 |
| 2021年05月13日 14:11:56 | op368 | set | messages: + msg393583 |
| 2021年05月13日 13:10:32 | jaswdr | set | messages: + msg393578 |
| 2021年05月13日 13:08:12 | op368 | set | messages: + msg393577 |
| 2021年05月13日 12:37:56 | jaswdr | set | messages: + msg393576 |
| 2021年05月13日 11:57:17 | op368 | set | messages: + msg393574 |
| 2021年05月13日 05:21:41 | shihai1991 | set | nosy:
+ orsenthil |
| 2021年05月12日 19:09:30 | jaswdr | set | nosy:
+ jaswdr messages: + msg393544 |
| 2020年08月11日 09:12:22 | wyz23x2 | set | versions: + Python 3.7, Python 3.9 |
| 2020年06月10日 11:28:16 | op368 | set | components: + Library (Lib) |
| 2020年06月10日 11:18:48 | op368 | create | |