homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: urllib.parse.urlunsplit makes relative path to absolute (http:g -> http:///g)
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.9, Python 3.8, Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: jaswdr, op368, orsenthil
Priority: normal Keywords:

Created on 2020年06月10日 11:18 by op368, last changed 2022年04月11日 14:59 by admin.

Messages (7)
msg371179 - (view) Author: Open Close (op368) * Date: 2020年06月10日 11:18
path 'g' in 'http:g' becomes '/g'.
 >>> urlsplit('http:g')
 SplitResult(scheme='http', netloc='', path='g', query='', fragment='')
 >>> urlunsplit(urlsplit('http:g'))
 'http:///g'
 >>> urlsplit('http:///g')
 SplitResult(scheme='http', netloc='', path='/g', query='', fragment='')
 >>> urljoin('http://a/b/c/d', 'http:g')
 'http://a/b/c/g'
 >>> urljoin('http://a/b/c/d', 'http:///g')
 'http://a/g'
The problematic part of the code is:
 def urlunsplit(components):
 [...]
 if netloc or (scheme and scheme in uses_netloc and url[:2] != '//'):
---> if url and url[:1] != '/': url = '/' + url
 url = '//' + (netloc or '') + url
Note also that urllib has decided on the interpretation of 'http:g' (in test).
 def test_RFC3986(self):
 [...]
 #self.checkJoin(RFC3986_BASE, 'http:g','http:g') # strict parser
 self.checkJoin(RFC3986_BASE, 'http:g','http://a/b/c/g') #relaxed parser
msg393544 - (view) Author: Jonathan Schweder (jaswdr) * Date: 2021年05月12日 19:09
@op368 I don't think that this is a bug, [1] literally uses this exact example and shows the expected behaviour. 
[1] https://datatracker.ietf.org/doc/html/rfc3986#section-5.4.2 
msg393574 - (view) Author: Open Close (op368) * Date: 2021年05月13日 11:57
hello, @jaswdr, but I can't understand what's wrong with my point.
What is 'the expected behaviour'?
msg393576 - (view) Author: Jonathan Schweder (jaswdr) * Date: 2021年05月13日 12:37
@op368 as far as I can see, regarding of any miss interpretation, yes, the RFC has this section:
 "http:g" = "http:g" ; for strict parsers
 / "http://a/b/c/g" ; for backward compatibility
What I can understand is that for "http:g" it will be translated to "http:///g" because of backward compatibility, this seems to be an edge case for the parser, since the RFC text also mention that this should be avoided.
msg393577 - (view) Author: Open Close (op368) * Date: 2021年05月13日 13:08
'http:///g' has absolute path '/g',
and as urljoin shows:
 >>> urljoin('http://a/b/c/d', 'http:///g')
 'http://a/g' # 'a' is netloc
So you are proposing third interpretation.
 "http:g" = "http:g" ; for strict parsers
 / "http://a/b/c/g" ; for backward compatibility
 / "http://a/g" ; (yours)
msg393578 - (view) Author: Jonathan Schweder (jaswdr) * Date: 2021年05月13日 13:10
Not exactly, in the RFC example they use a/b/c for the path, but when using http:g there is no nested path, so it should be http:///g, no?
msg393583 - (view) Author: Open Close (op368) * Date: 2021年05月13日 14:11
I tried hard (even read RFC1630),
but I think no.
History
Date User Action Args
2022年04月11日 14:59:32adminsetgithub: 85110
2021年05月13日 14:11:56op368setmessages: + msg393583
2021年05月13日 13:10:32jaswdrsetmessages: + msg393578
2021年05月13日 13:08:12op368setmessages: + msg393577
2021年05月13日 12:37:56jaswdrsetmessages: + msg393576
2021年05月13日 11:57:17op368setmessages: + msg393574
2021年05月13日 05:21:41shihai1991setnosy: + orsenthil
2021年05月12日 19:09:30jaswdrsetnosy: + jaswdr
messages: + msg393544
2020年08月11日 09:12:22wyz23x2setversions: + Python 3.7, Python 3.9
2020年06月10日 11:28:16op368setcomponents: + Library (Lib)
2020年06月10日 11:18:48op368create

AltStyle によって変換されたページ (->オリジナル) /