homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Normalization error in urlunparse
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.4, Python 3.5, Python 2.7
process
Status: closed Resolution: duplicate
Dependencies: Superseder: urllib.parse wrongly strips empty #fragment, ?query, //netloc
View: 22852
Assigned To: orsenthil Nosy List: Aaron1011, BreamoreBoy, dstanek, eric.araujo, martin.panter, orsenthil
Priority: normal Keywords:

Created on 2009年04月25日 19:12 by eric.araujo, last changed 2022年04月11日 14:56 by admin. This issue is now closed.

Messages (6)
msg86538 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2009年04月25日 19:12
Docstring for urlunparse says:
 """Put a parsed URI back together again. This may result in a
 slightly different, but equivalent URI, if the URI that was parsed
 originally had redundant delimiters, e.g. a ? with an empty query
 (the draft states that these are equivalent)."""
"Draft" here refers to RFC 1808, superseded by 3986. However, RFC 3986
(section 6.2.3) states:
"Normalization should not remove delimiters when their associated
component is empty unless licensed to do so by the scheme 
specification. For example, the URI "http://example.com/?" cannot be 
 assumed to be equivalent to any of the examples above. Likewise, the 
 presence or absence of delimiters within a userinfo subcomponent is 
 usually significant to its interpretation. The fragment component is 
 not subject to any scheme-based normalization; thus, two URIs that 
differ only by the suffix "#" are considered different regardless of 
the scheme."
I guess we need some tests here to check compliance.
msg86541 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2009年04月25日 19:45
This is indeed a bug. urlunparse should special-case "#" so as not to
discard it.
msg110314 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010年07月14日 19:09
Currently this claim will fail:
>>> obj = urlparse.urlparse('http://a/b/c?')
>>> urlparse.urlunparse(obj)
'http://a/b/c'
>>> obj = urlparse.urlparse('http://a/b/c#')
>>> urlparse.urlunparse(obj)
'http://a/b/c'
If we move away from the current behavior, there will surely be some test failures that can be observed for urljoins. We will have to consider those cases too while fixing this.
msg228009 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2014年09月30日 21:45
Slipped under the radar guys?
msg228853 - (view) Author: Aaron Hill (Aaron1011) * Date: 2014年10月09日 10:21
In order to fix this, I think ParseResult needs to have two additional fields, indicating with an empty prefix or query string are used.
Both ParseResult.fragment and ParseResult.query omit the leading '#' or '?' from their value. This makes it impossible to determine if the fragment/query string is entirely absent, or has no value.
msg235579 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015年02月09日 00:17
Looks like this duplicates Issue 22852, which has a patch, although its author had second thoughts on the implementation
History
Date User Action Args
2022年04月11日 14:56:48adminsetgithub: 50093
2015年05月31日 04:25:46martin.pantersetstatus: open -> closed
superseder: urllib.parse wrongly strips empty #fragment, ?query, //netloc
resolution: duplicate
stage: resolved
2015年02月09日 00:17:34martin.pantersetnosy: + martin.panter
messages: + msg235579
2014年10月09日 10:21:51Aaron1011setnosy: + Aaron1011
messages: + msg228853
2014年09月30日 21:45:28BreamoreBoysetnosy: + BreamoreBoy

messages: + msg228009
versions: + Python 3.4, Python 3.5, - Python 3.1, Python 3.2
2010年11月02日 19:36:38eric.araujosetnosy: orsenthil, dstanek, eric.araujo
title: Possible normalization error in urlparse.urlunparse -> Normalization error in urlunparse
components: + Library (Lib)
versions: + Python 3.1, Python 2.7, Python 3.2
2010年08月18日 00:15:17dstaneksetnosy: + dstanek
2010年07月14日 19:09:30orsenthilsetmessages: + msg110314
2010年07月11日 14:28:57eric.araujosetassignee: orsenthil

type: behavior
nosy: + orsenthil
2009年04月25日 19:45:10eric.araujosetmessages: + msg86541
2009年04月25日 19:12:38eric.araujocreate

AltStyle によって変換されたページ (->オリジナル) /