This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2010年10月29日 05:26 by belopolsky, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Messages (10) | |||
|---|---|---|---|
| msg119855 - (view) | Author: Alexander Belopolsky (belopolsky) * (Python committer) | Date: 2010年10月29日 05:26 | |
The following example in Doc/library/urlparse.rst is wrong >>> urlparse('www.cwi.nl:80/%7Eguido/Python.html') ParseResult(scheme='', netloc='', path='www.cwi.nl:80/%7Eguido/Python.html', params='', query='', fragment='') In the actual output, scheme='www.cwi.nl'. In addition, the preceding text is confusing and probably not grammatical: """ Otherwise, it is not possible to distinguish between netloc and path components, and would the indistinguishable component would be classified as the path as in a relative URL. """ Discovered while working on issue 10225. |
|||
| msg119857 - (view) | Author: Alexander Belopolsky (belopolsky) * (Python committer) | Date: 2010年10月29日 05:51 | |
Looks like I've been beaten again by make doctest picking up older python, but something is not right here:
In Python 2.6.5:
>>> urlparse('www.cwi.nl:80/%7Eguido/Python.html')
ParseResult(scheme='www.cwi.nl', netloc='', path='80/%7Eguido/Python.html', params='', query='', fragment='')
but in 2.7:
>>> urlparse('www.cwi.nl:80/%7Eguido/Python.html')
ParseResult(scheme='', netloc='', path='www.cwi.nl:80/%7Eguido/Python.html', params='', query='', fragment='')
and the text preceding the example in the doc does not really tell which is right.
|
|||
| msg119859 - (view) | Author: Georg Brandl (georg.brandl) * (Python committer) | Date: 2010年10月29日 06:15 | |
I think this is correct: it is the new behavior after the fix for #754016 was committed. |
|||
| msg119867 - (view) | Author: Alexander Belopolsky (belopolsky) * (Python committer) | Date: 2010年10月29日 07:05 | |
On Fri, Oct 29, 2010 at 2:15 AM, Georg Brandl <report@bugs.python.org> wrote: .. > I think this is correct: it is the new behavior after the fix for #754016 was committed. > I agree. I kept the issue open because I cannot parse """ Otherwise, it is not possible to distinguish between netloc and path components, and would the indistinguishable component would be classified as the path as in a relative URL. """ |
|||
| msg119868 - (view) | Author: Georg Brandl (georg.brandl) * (Python committer) | Date: 2010年10月29日 07:06 | |
That's for Senthil to rephrase as intended :) |
|||
| msg119873 - (view) | Author: Senthil Kumaran (orsenthil) * (Python committer) | Date: 2010年10月29日 08:56 | |
- Otherwise, it is not possible to distinguish between netloc and path - components, and would the indistinguishable component would be classified - as the path as in a relative URL. + If the netloc does not start with '//', the module cannot distinguish it + from path and it would classify it as path component in the relative url. How does this sound? |
|||
| msg119914 - (view) | Author: Éric Araujo (eric.araujo) * (Python committer) | Date: 2010年10月29日 16:10 | |
// is not part of the netloc in RFC terms, it’s a delimiter between components |
|||
| msg119991 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2010年10月30日 14:51 | |
How about this:
- If the scheme value is not specified, urlparse following the syntax
- specifications from RFC 1808, expects the netloc value to start with '//',
- Otherwise, it is not possible to distinguish between net_loc and path
- component and would classify the indistinguishable component as path as in
- a relative url.
+ Following the syntax specifications in RFC 1808, urlparse recognizes
+ a netloc only if it is properly introduced by '//'. Otherwise the
+ input must be presumed to be a relative URL and thus to start with
+ a path component.
However, it seems to me there is a bug here:
>>> urlparse.urlparse('www.k.com:80/path')
ParseResult(scheme='', netloc='', path='www.k.com:80/path', params='',
query='', fragment='')
>>> urlparse.urlparse('www.k.com:path')
ParseResult(scheme='www.k.com', netloc='', path='path', params='',
query='', fragment='')
I think the second one is correct and that the first one should produce
ParseResult(scheme='www.k.com', netloc='', path='80/path', params='',
query='', fragment='')
I haven't read all the way through the RFC again, though. But *one*
of the above is wrong.
|
|||
| msg120678 - (view) | Author: Senthil Kumaran (orsenthil) * (Python committer) | Date: 2010年11月07日 13:21 | |
Fixed the wordings in r86296(py3k), r86297(release31-maint) and r86298(release27-maint). David, for the examples you mentioned, the first one's parsing logic follows the explanation that is written. It is correct. For the second example, the port value not being a DIGIT exhibits such a behavior. I am unable to recollect the reason for this behavior. Either the URL is invalid (PORT is not a DIGIT, and parse module is simply ignoring to raise an error - it's okay, given the input is invalid) or it needs to distinguish the ':' as a port separator from path separator for some valid urls. I think, if we find a better reason to change something for the second scenario, we shall address that. |
|||
| msg120710 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2010年11月08日 02:37 | |
Senthil, no it isn't. There is no way to know a priori that ':80' represents a port number rather than a path, absent the // introducer for the netloc. This bug is fixed; I ought to open a new one for the path thing but perhaps I will wait for a user report instead :) |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:08 | admin | set | github: 54435 |
| 2010年11月08日 02:37:27 | r.david.murray | set | messages: + msg120710 |
| 2010年11月07日 13:21:58 | orsenthil | set | status: open -> closed type: behavior messages: + msg120678 resolution: fixed stage: resolved |
| 2010年10月30日 14:51:17 | r.david.murray | set | nosy:
+ r.david.murray messages: + msg119991 |
| 2010年10月29日 16:10:29 | eric.araujo | set | nosy:
+ eric.araujo messages: + msg119914 |
| 2010年10月29日 08:56:04 | orsenthil | set | messages: + msg119873 |
| 2010年10月29日 07:06:12 | georg.brandl | set | messages: + msg119868 |
| 2010年10月29日 07:05:13 | belopolsky | set | messages: + msg119867 |
| 2010年10月29日 06:15:06 | georg.brandl | set | nosy:
+ georg.brandl messages: + msg119859 |
| 2010年10月29日 05:51:24 | belopolsky | set | messages: + msg119857 |
| 2010年10月29日 05:32:19 | georg.brandl | set | assignee: docs@python -> orsenthil nosy: + orsenthil |
| 2010年10月29日 05:26:04 | belopolsky | create | |