homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: urlparse should parse query and fragment for arbitrary schemes
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.2, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: orsenthil Nosy List: Arfrever, Nick.Welch, benjamin.peterson, doko, eric.araujo, ezio.melotti, georg.brandl, martin.panter, orsenthil, pitrou, python-dev
Priority: critical Keywords:

Created on 2010年07月24日 22:58 by Nick.Welch, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Messages (16)
msg111511 - (view) Author: Nick Welch (Nick.Welch) Date: 2010年07月24日 22:58
While the netloc/path parts of URLs are scheme-specific, and urlparse can be forgiven for refusing to parse them for unknown schemes, the query and fragment parts are standardized, and should be parsed for unrecognized schemes.
According to Wikipedia:
------------------
Internet standard STD 66 (also RFC 3986) defines the generic syntax to be used in all URI schemes. Every URI is defined as consisting of four parts, as follows:
<scheme name> : <hierarchical part> [ ? <query> ] [ # <fragment> ]
------------------
http://en.wikipedia.org/wiki/URI_scheme#Generic_syntax
Here is a demonstration of what urlparse currently does:
>>> urlparse.urlsplit('myscheme://netloc/path?a=b#frag')
SplitResult(scheme='myscheme', netloc='', path='//netloc/path?a=b#frag', query='', fragment='')
>>> urlparse.urlsplit('http://netloc/path?a=b#frag')
SplitResult(scheme='http', netloc='netloc', path='/path', query='a=b', fragment='frag')
msg161087 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012年05月19日 00:13
New changeset 79e6ff3d9afd by Senthil Kumaran in branch '2.7':
Issue9374 - Generic parsing of query and fragment portion of urls for any scheme
http://hg.python.org/cpython/rev/79e6ff3d9afd
New changeset a9d43e21f7d8 by Senthil Kumaran in branch '3.2':
Issue9374 - Generic parsing of query and fragment portion of urls for any scheme
http://hg.python.org/cpython/rev/a9d43e21f7d8
New changeset 152c78b94e41 by Senthil Kumaran in branch 'default':
Issue9374 - Generic parsing of query and fragment portion of urls for any scheme
http://hg.python.org/cpython/rev/152c78b94e41 
msg161088 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2012年05月19日 00:16
Thanks for raising this issue, Nick. Yes, I verified in both RFC 3986 and 2396 and realized we can safely adopt a generic parsing system for query and fragment portions of the urls for any scheme. Since it was supported in earlier versions too, I felt it was good move to backport too.
Fixed in all versions. 
Thanks!
msg165546 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2012年07月15日 20:06
Removing the module attributes causes third-party code to break. See one example here: http://lists.idyll.org/pipermail/testing-in-python/2012-July/005082.html 
msg165547 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2012年07月15日 20:07
Better link: https://github.com/pypa/pip/issues/552 
msg168899 - (view) Author: Matthias Klose (doko) * (Python committer) Date: 2012年08月22日 17:29
this breaks the following upstream builds:
createrepo, linkchecker, gwibber, pegasus-wm
there is no need to remove is_hierarchical on the branches. it's not used by urlparse at all.
is it safe to just keep the uses_query and uses_fragment lists on the branches as well?
raising to a release blocker, I consider this as a regression for the 2.7 and 3.2 release series.
msg169039 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012年08月24日 16:12
Senthil, either the module globals should be re-added for compatibility, or the commits should be reverted, IMO.
msg169040 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012年08月24日 16:17
New changeset a0b3cb52816e by Georg Brandl in branch '3.2':
Closes #9374: add back now-unused module attributes; removing them is a backward compatibility issue, since they have a public-seeming name.
http://hg.python.org/cpython/rev/a0b3cb52816e
New changeset c93fbc2caba5 by Georg Brandl in branch 'default':
Closes #9374: merge with 3.2
http://hg.python.org/cpython/rev/c93fbc2caba5
New changeset a43481210964 by Georg Brandl in branch '2.7':
Closes #9374: add back now-unused module attributes; removing them is a backward compatibility issue, since they have a public-seeming name.
http://hg.python.org/cpython/rev/a43481210964 
msg169052 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2012年08月24日 17:23
Oops. I had not seen Eric and Mattiahs comment to this issue, which
pointed out to the problem. Sorry for not acting on this.
Thanks Georg for adding those module attributes back.
On Fri, Aug 24, 2012 at 9:17 AM, Roundup Robot <report@bugs.python.org> wrote:
>
> Roundup Robot added the comment:
>
> New changeset a0b3cb52816e by Georg Brandl in branch '3.2':
> Closes #9374: add back now-unused module attributes; removing them is a backward compatibility issue, since they have a public-seeming name.
> http://hg.python.org/cpython/rev/a0b3cb52816e
>
> New changeset c93fbc2caba5 by Georg Brandl in branch 'default':
> Closes #9374: merge with 3.2
> http://hg.python.org/cpython/rev/c93fbc2caba5
>
> New changeset a43481210964 by Georg Brandl in branch '2.7':
> Closes #9374: add back now-unused module attributes; removing them is a backward compatibility issue, since they have a public-seeming name.
> http://hg.python.org/cpython/rev/a43481210964
>
> ----------
> resolution: remind -> fixed
> status: open -> closed
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue9374>
> _______________________________________
msg171448 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2012年09月28日 12:28
After encountering an instance of people relying on fragment not being parsed for "irc://" URLs, with resulting breakage, I don't think we should change this in point releases. IOW, it's fine for 3.3.0, but not for 2.7.x or 3.2.x.
It may be fixing a bug, but the bug is not obvious and the fix is not backward compatible. I therefore suggest to roll back the commits to 3.2 and 2.7.
msg171452 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012年09月28日 12:47
If there is a list of known protocols that don't use the fragment, can't we include it in urlparse as we already do in Lib/urlparse.py:34?
If #channel in irc://example.com/#channel should not be parsed as fragment, then this can be considered as a regression. This doesn't necessary mean that the whole change is a regression though.
msg171465 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2012年09月28日 13:40
People make up URL schemes all the time, irc:// is not a special case. This change will mean breakage for them, unwarranted.
msg171469 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2012年09月28日 14:05
One would hope that people making up URI schemes would follow the generic syntax (and thus irc would be an exception), but as the risk exists I agree we should not break code in bugfix releases.
msg171557 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012年09月29日 07:27
New changeset 950320c70fb4 by Georg Brandl in branch 'default':
Add a versionchanged note for #9374 changes.
http://hg.python.org/cpython/rev/950320c70fb4 
msg179270 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013年01月07日 16:46
> It may be fixing a bug, but the bug is not obvious and the fix is not
> backward compatible. I therefore suggest to roll back the commits to
> 3.2 and 2.7.
Well, the bug is quite obvious to me :-) (just hit it here)
The fix for those who want the old behaviour is obvious: just pass `allow_fragments=False` to urlparse(). OTOH, if you revert the fix, patching things manually is quite cumbersome.
msg216244 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2014年04月14日 22:48
Reviewed the issue and correct rollbacks and commits were applied.
This ticket should be closed. Thanks!
History
Date User Action Args
2022年04月11日 14:57:04adminsetgithub: 53620
2014年04月14日 22:48:03orsenthilsetstatus: open -> closed

messages: + msg216244
2013年11月24日 01:49:08martin.pantersetnosy: + martin.panter
2013年01月07日 16:46:02pitrousetmessages: + msg179270
2012年09月29日 07:27:40python-devsetmessages: + msg171557
2012年09月29日 06:59:41georg.brandlsetversions: - Python 3.3
2012年09月28日 14:05:18eric.araujosetmessages: + msg171469
2012年09月28日 13:40:04georg.brandlsetmessages: + msg171465
2012年09月28日 12:47:22ezio.melottisetmessages: + msg171452
2012年09月28日 12:28:58georg.brandlsetpriority: release blocker -> critical
status: closed -> open
messages: + msg171448
2012年08月24日 17:23:21orsenthilsetmessages: + msg169052
2012年08月24日 16:17:37python-devsetstatus: open -> closed
resolution: remind -> fixed
messages: + msg169040
2012年08月24日 16:12:59pitrousetnosy: + pitrou
messages: + msg169039
2012年08月22日 17:29:38dokosetpriority: normal -> release blocker

nosy: + benjamin.peterson, georg.brandl, doko
messages: + msg168899

resolution: fixed -> remind
2012年07月15日 20:07:21eric.araujosetmessages: + msg165547
2012年07月15日 20:06:39eric.araujosetstatus: closed -> open

messages: + msg165546
2012年06月14日 16:10:11Arfreversetnosy: + Arfrever
2012年05月19日 00:16:23orsenthilsetstatus: open -> closed
messages: + msg161088

assignee: orsenthil
resolution: fixed
stage: needs patch -> resolved
2012年05月19日 00:13:01python-devsetnosy: + python-dev
messages: + msg161087
2012年05月08日 03:29:53eric.araujosetnosy: + orsenthil, eric.araujo
2012年05月06日 22:50:02ezio.melottisetnosy: + ezio.melotti
stage: needs patch

versions: + Python 2.7, Python 3.2, Python 3.3, - Python 2.6
2010年07月24日 22:58:39Nick.Welchcreate

AltStyle によって変換されたページ (->オリジナル) /