This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2022年01月10日 21:50 by lincolnauster, last changed 2022年04月11日 14:59 by admin.
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 30520 | open | lincolnauster, 2022年01月10日 21:55 | |
| Messages (11) | |||
|---|---|---|---|
| msg410259 - (view) | Author: Lincoln Auster (lincolnauster) * | Date: 2022年01月10日 21:50 | |
It looks like this was discussed in 2013-2015 here: https://bugs.python.org/issue18828 Basically, with all the URL schemes that exist in the world (and the possibility of a custom scheme), the current strategy of enumerating what do what in a hard-coded variable is a bit ... weird. Among the proposed solutions in 18828, some were: + Have a global registry of what schemes do what (criticized for being overkill, and I can't say I disagree) + Get rid of the scheme lists altogether, and assume every scheme supports everything (isn't backwards compatible; might break with intended behavior, too). + Switch the use_relative whitelist to a blacklist: (maybe fine in practice, maybe not; either way it doesn't really fix the underlying issue) + Work around it with global state (modify the uses_* lists; this is what I'm doing in my code, and I can't say I like it much). An alternative implemented I've implemented in my fork (https://github.com/lincolnauster/cpython/tree/urllib-custom-schemes) is to have an Enum with all the weird scheme-based behaviors that may occur (urllib.parse.SchemeClass in my fork) and allow passing a set of those Enums to functions relying on scheme-specific behavior, and adding all the elements of that set to what's been determined by the scheme. (See the test case for a concrete example; this explanation is not great). Some things I like about this: + Backwards compatibility. + It makes the functions using it as a general strategy a bit more pure. + It makes client code deal with edge cases. Some things that could be changed: + There's no way to remove behaviors you *don't* want. + It makes client code deal with edge cases. As a side thought: if the above could be adopted, the uses_* lists could be enforced as immutable, which, while breaking compatibility, could make client code a bit cleaner. |
|||
| msg413066 - (view) | Author: Éric Araujo (eric.araujo) * (Python committer) | Date: 2022年02月11日 12:56 | |
I remember a discussion about this years ago. urllib is a module that pre-dates the idea of universal parsing for URIs, where the delimiters (like ://) are enough to determine the parts of a URI and give them meaning (host, port, user, path, etc). Backward compat for urllib is always a concern; someone said at the time that it could be good to have a new module for modern, generic parsing, but that hasn’t happened. Maybe a new parse function, or new parameter to the existing one, could be easier to add. |
|||
| msg413084 - (view) | Author: Lincoln Auster (lincolnauster) * | Date: 2022年02月11日 16:24 | |
> Maybe a new parse function, or new parameter to the existing one, > could be easier to add. If I'm understanding you right, that's what this (and the PR) is - an extra optional parameter to urllib.parse to supplement the existing (legacy?) hard-coded list. |
|||
| msg413123 - (view) | Author: Éric Araujo (eric.araujo) * (Python committer) | Date: 2022年02月12日 11:47 | |
In my idea it would not be a list of things that you have to pass piecemeal to request specific behaviour, but another function or a new param (like `parse(string, universal=True)`) that implements universal parsing. We could even handle things like #22852 in that mode (although ironically, correct behaviour for that requires having a registry of schemes). |
|||
| msg413139 - (view) | Author: Lincoln Auster (lincolnauster) * | Date: 2022年02月12日 18:11 | |
> In my idea it would not be a list of things that you have to pass > piecemeal to request specific behaviour, but another function or a new > param (like `parse(string, universal=True)`) that implements universal > parsing. If I'm correct in my understanding of a universal parse function (a function with all the SchemeClasses enabled unilaterally), some parse_universal function would be a pretty trivial thing to add with the API I've already got here (though it wouldn't address 22852 without some extra work afaict). I do think keeping the 'piecemeal' options exposed has some utility, though, especially since the uses_* lists already treat them on such a granular level. Do we think a parse_universal function would be helpful to add on top of this, or just repetitive? |
|||
| msg413314 - (view) | Author: karl (karlcow) * | Date: 2022年02月16日 04:01 | |
Just to note that there is a maintained list of officially accepted schemes at IANA. https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml In addition there is a list of unofficial schemes on wikipedia https://en.wikipedia.org/wiki/List_of_URI_schemes#Unofficial_but_common_URI_schemes |
|||
| msg416369 - (view) | Author: Ethan Furman (ethan.furman) * (Python committer) | Date: 2022年03月30日 14:54 | |
Éric Araujo wrote on PR30520: ---------------------------- > No, we should not redefine the behavior of urlparse. > > I was always talking about adding another function. Yes it can be a one-liner, > but my point is that I don’t see the usefulness of having the separate flags to > pick and choose parts of standard parsing. I suspect the usefulness comes from error checking -- if a scheme doesn't support parameters, then having what looks like parameters converted would not be helpful. Further, while a new function is definitely safer, how many parse options do we need? Anyone else remember `os.popen()`, `os.popen2`, `os.popen3`, and, finally, `os.popen4()`? Assuming we just enhance the existing function, would it be more palatable if there was a `SchemeFlag.ALL`, so universal parsing was just `urlparse(uri_string, flags=SchemeFlag.ALL)`? To be really user-friendly, we could have: class SchemeFlag(Flag): RELATIVE = auto() NETLOC = auto() PARAMS = auto() UNIVERSAL = RELATIVE | NETLOC | PARAMS # def __repr__(self): return f"{self.module}.{self._name_}" __str__ = __repr__ RELATIVE, NETLOC, PARAMS, UNIVERSAL = SchemeFlag Then the above call becomes: urlparse(uri_string, flags=UNIVERSAL) |
|||
| msg416462 - (view) | Author: Éric Araujo (eric.araujo) * (Python committer) | Date: 2022年03月31日 22:10 | |
I would like to know what Senthil is thinking before the PR with options à la carte are merged! |
|||
| msg416463 - (view) | Author: Ethan Furman (ethan.furman) * (Python committer) | Date: 2022年03月31日 22:31 | |
Sounds good. |
|||
| msg416464 - (view) | Author: Senthil Kumaran (orsenthil) * (Python committer) | Date: 2022年03月31日 22:41 | |
I will review this in a day. I had been following the conversation, but couldn't look deeper into the code. Thank you for engaging and contributions. |
|||
| msg416633 - (view) | Author: Senthil Kumaran (orsenthil) * (Python committer) | Date: 2022年04月03日 17:36 | |
Hi all, I was looking at it. Introducing an enum at the last parameter is going to add cost of understanding the behavior to this function. I am doing further reading on the previous discussions and PR(s) now. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:59:54 | admin | set | github: 90495 |
| 2022年04月04日 03:47:32 | ned.deily | set | assignee: docs@python -> nosy: - barry, paul.moore, ronaldoussoren, vstinner, larry, tim.golden, ned.deily, ezio.melotti, mrabarnett, r.david.murray, docs@python, zach.ware, koobs, steve.dower, lys.nikolaou, pablogsal components: - Build, Demos and Tools, Documentation, Extension Modules, Interpreter Core, macOS, Regular Expressions, Tests, Unicode, Windows, XML, 2to3 (2.x to 3.x conversion tool), ctypes, Cross-Build, email, Argument Clinic, FreeBSD, SSL, C API, Parser versions: - Python 3.7 |
| 2022年04月04日 03:46:29 | ned.deily | set | hgrepos: - hgrepo414 |
| 2022年04月04日 03:40:24 | ned.deily | set | files: - mitre_f188eec1268fd49bdc7375fc5b77ded657c150875fede1a4d797f818d2514e88_120.csv |
| 2022年04月04日 03:28:46 | qwerazzfffs | set | files:
+ mitre_f188eec1268fd49bdc7375fc5b77ded657c150875fede1a4d797f818d2514e88_120.csv nosy: + larry, paul.moore, tim.golden, koobs, r.david.murray, zach.ware, steve.dower, ned.deily, barry, pablogsal, ezio.melotti, ronaldoussoren, lys.nikolaou, docs@python, vstinner, mrabarnett versions: + Python 3.7 hgrepos: + hgrepo414 assignee: docs@python components: + Build, Demos and Tools, Documentation, Extension Modules, Interpreter Core, macOS, Regular Expressions, Tests, Unicode, Windows, XML, 2to3 (2.x to 3.x conversion tool), ctypes, Cross-Build, email, Argument Clinic, FreeBSD, SSL, C API, Parser |
| 2022年04月03日 17:36:07 | orsenthil | set | messages: + msg416633 |
| 2022年03月31日 22:41:06 | orsenthil | set | messages: + msg416464 |
| 2022年03月31日 22:31:52 | ethan.furman | set | messages: + msg416463 |
| 2022年03月31日 22:10:19 | eric.araujo | set | messages: + msg416462 |
| 2022年03月30日 14:54:08 | ethan.furman | set | messages: + msg416369 |
| 2022年03月29日 16:07:16 | ethan.furman | set | nosy:
+ ethan.furman |
| 2022年02月16日 04:01:03 | karlcow | set | nosy:
+ karlcow messages: + msg413314 |
| 2022年02月14日 22:11:19 | brett.cannon | set | nosy:
- brett.cannon |
| 2022年02月12日 18:11:25 | lincolnauster | set | messages: + msg413139 |
| 2022年02月12日 11:47:38 | eric.araujo | set | messages: + msg413123 |
| 2022年02月11日 16:24:18 | lincolnauster | set | messages: + msg413084 |
| 2022年02月11日 12:56:26 | eric.araujo | set | nosy:
+ eric.araujo, brett.cannon, orsenthil, lukasz.langa messages: + msg413066 versions: + Python 3.11 |
| 2022年01月10日 21:55:16 | lincolnauster | set | keywords:
+ patch stage: patch review pull_requests: + pull_request28721 |
| 2022年01月10日 21:50:56 | lincolnauster | create | |