This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2012年05月16日 08:54 by wichert, last changed 2022年04月11日 14:57 by admin.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| urllib-quote-14826.patch | jerub, 2012年07月07日 21:55 | |||
| urllib-request.patch | jerub, 2012年07月08日 07:24 | Followup patch for urllib-quote-14826.patch | ||
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 12755 | open | gregory.p.smith, 2019年04月10日 00:39 | |
| PR 12758 | merged | gregory.p.smith, 2019年04月10日 09:07 | |
| PR 12759 | merged | miss-islington, 2019年04月10日 09:18 | |
| PR 12760 | closed | miss-islington, 2019年04月10日 09:18 | |
| Messages (24) | |||
|---|---|---|---|
| msg160811 - (view) | Author: Wichert Akkerman (wichert) | Date: 2012年05月16日 08:54 | |
There appears to be an odd networking issue with how urllib2 sends HTTP requests. Downloading an image from maw.liquifire.com gives an error:
$ python -c 'import urllib2 ; urllib2.urlopen("http://maw.liquifire.com/maw?set=image[2302.000.13314 a]&call=url[file:325x445]")'
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/lib/python2.7/urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 400, in open
response = self._open(req, data)
File "/usr/lib/python2.7/urllib2.py", line 418, in _open
'_open', req)
File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 1207, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib/python2.7/urllib2.py", line 1180, in do_open
r = h.getresponse(buffering=True)
File "/usr/lib/python2.7/httplib.py", line 1030, in getresponse
response.begin()
File "/usr/lib/python2.7/httplib.py", line 407, in begin
version, status, reason = self._read_status()
File "/usr/lib/python2.7/httplib.py", line 365, in _read_status
line = self.fp.readline()
File "/usr/lib/python2.7/socket.py", line 447, in readline
data = self._sock.recv(self._rbufsize)
socket.error: [Errno 104] Connection reset by peer
Downloading the same image using wget works fine:
$ wget 'http://maw.liquifire.com/maw?set=image[2302.000.13314 a]&call=url[file:325x445]'
--2012年05月16日 10:53:27-- http://maw.liquifire.com/maw?set=image[2302.000.13314%20a]&call=url[file:325x445]
Resolving maw.liquifire.com (maw.liquifire.com)... 184.169.78.6
Connecting to maw.liquifire.com (maw.liquifire.com)|184.169.78.6|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 11393 (11K) [image/jpeg]
Saving to: `maw?set=image[2302.000.13314 a]&call=url[file:325x445]'
100%[======================================>] 11,393 --.-K/s in 0.003s
2012年05月16日 10:53:27 (3.49 MB/s) - `maw?set=image[2302.000.13314 a]&call=url[file:325x445]' saved [11393/11393]
|
|||
| msg160930 - (view) | Author: Anthony Long (antlong) | Date: 2012年05月16日 20:54 | |
http://maw.liquifire.com/maw?set=image[2302.000.13314%20a]&call=url[file:325x445] works properly. Notice the %20 instead of ' ' |
|||
| msg164945 - (view) | Author: Stephen Thorne (jerub) * | Date: 2012年07月07日 21:55 | |
Here is a patch that uses the same quoting logic in urllib.request.Request.__init__ as is used by urllib.request.URLopener.open() |
|||
| msg164955 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2012年07月08日 00:15 | |
New changeset 01c8d800efd2 by Senthil Kumaran in branch '3.2': Fix issue14826 - make urllib.request.Request quoted url consistent with URLOpener open method. http://hg.python.org/cpython/rev/01c8d800efd2 New changeset e6bb919b2623 by Senthil Kumaran in branch 'default': Fix issue14826 - make urllib.request.Request quoted url consistent with URLOpener open method. http://hg.python.org/cpython/rev/e6bb919b2623 |
|||
| msg164957 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2012年07月08日 00:37 | |
New changeset d931a3b64fd6 by Senthil Kumaran in branch '2.7': Fix issue14826 - make urllib.request.Request quoted url consistent with URLOpener open method. http://hg.python.org/cpython/rev/d931a3b64fd6 |
|||
| msg164958 - (view) | Author: Senthil Kumaran (orsenthil) * (Python committer) | Date: 2012年07月08日 00:38 | |
Thanks for the patch, Stephen. |
|||
| msg164966 - (view) | Author: Ross Lagerwall (rosslagerwall) (Python committer) | Date: 2012年07月08日 06:17 | |
It looks like this broke the build bots: http://buildbot.python.org/all/builders/AMD64%20Ubuntu%20LTS%202.7/builds/66/steps/test/logs/stdio |
|||
| msg164969 - (view) | Author: Stephen Thorne (jerub) * | Date: 2012年07月08日 07:24 | |
Here's a followup patch that fixes the trunk build for me. This will unbreak the builds as well as fixing this bug, but it should be investigated why URLopener calls to_bytes() and Request does not. Ideally this interface should be consistent. |
|||
| msg164973 - (view) | Author: Senthil Kumaran (orsenthil) * (Python committer) | Date: 2012年07月08日 08:12 | |
It seems to me that toBytes in urllib was introduce to restrict the allowance of urls which were sent as unicode strings. We wanted urls to be ascii strings in Python2. http://mail.python.org/pipermail/python-bugs-list/2000-November/002779.html And quoting to toBytes / to_bytes is actually the problem here, as cookielib test cases is sending a unicode character which ascii encoding fails to operate on. I am thinking that we should arrive at a solution which brings consistency and fixes any previous mistakes. In 3.3, I think, rework of to_bytes may also be a good solution, in 2.7 and 3.2, I think stephen's attached patch is in good lines. Practically, the quote is more important than the failure at toBytes by sending an unicode url. |
|||
| msg164975 - (view) | Author: Éric Araujo (eric.araujo) * (Python committer) | Date: 2012年07月08日 08:14 | |
I’m not sure urllib should accept invalid (non-escaped) URLs; a higher-level application can do so, but for the low-level stdlib module it is more debatable. |
|||
| msg164976 - (view) | Author: Senthil Kumaran (orsenthil) * (Python committer) | Date: 2012年07月08日 08:18 | |
Yeah, I am thinking so as well in that case, the test_cookielib.py test case may need a change. |
|||
| msg164980 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2012年07月08日 09:22 | |
New changeset ee1828dc3bf6 by Senthil Kumaran in branch '3.2': issue 14826 - Address the buildbot failure ( explanation msg164973) http://hg.python.org/cpython/rev/ee1828dc3bf6 New changeset dc30111a5d7e by Senthil Kumaran in branch 'default': issue 14826 - Address the buildbot failure quote of url is the required change ( explanation msg164973) http://hg.python.org/cpython/rev/dc30111a5d7e New changeset 224b27a8d9be by Senthil Kumaran in branch '2.7': revert the changes done in d931a3b64fd6 - buildbot failure. http://hg.python.org/cpython/rev/224b27a8d9be |
|||
| msg164981 - (view) | Author: Senthil Kumaran (orsenthil) * (Python committer) | Date: 2012年07月08日 09:25 | |
The last change should settle the buildbots, But I would like to come back to this issue again tomorrow with focus - 3.3 to see if we can deal with removing to_bytes and then in 2.7 to see if something can done to test_cookielib.py test case. |
|||
| msg164982 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2012年07月08日 09:30 | |
Senthil, do you read python-dev? I think this change was prematurate from the start (nevermind the fact that you didn't run the test suite before committing). For example, if you have an URL with a non-ASCII domain name such as "http://وزارة-الأتصالات.مصر/", the domain name should IDNA-encoded, not %-encoded like the rest. Furthermore, some people are certainly already quoting their URLs to workaround this issue, so "fixing" it will break their code by double-escaping the URLs. You've got to be more careful. |
|||
| msg165023 - (view) | Author: Christian Heimes (christian.heimes) * (Python committer) | Date: 2012年07月08日 16:42 | |
The docs [1] state that `url should be a string containing a valid URL.` An URL with a space ' ' is not a valid URL as the space must be quoted as %20. The brackets may also cause problems as they are not valid xs:anyURI chars. I vote for reverting the chances as they break the API. You could improve the docs and emphasize that URLs must be quoted correctly as the module doesn't implement browser magic. [1] http://docs.python.org/py3k/library/urllib.request.html#urllib.request.Request |
|||
| msg165046 - (view) | Author: Senthil Kumaran (orsenthil) * (Python committer) | Date: 2012年07月08日 23:49 | |
On Sun, Jul 8, 2012 at 2:30 AM, Antoine Pitrou <report@bugs.python.org> wrote: > > Senthil, do you read python-dev? I think this change was prematurate from the start (nevermind the fact that you didn't run the test suite before committing). I thought that the other legacy URLOpen was quoting it correct and then I wanted to see it can be made consistent. It did get me thinking that why it was different for so long. I realize that committing soon was a mistake. > For example, if you have an URL with a non-ASCII domain name such as "http://وزارة-الأتصالات.مصر/", the domain name should IDNA-encoded, not %-encoded like the rest. Agreed and understood. > Furthermore, some people are certainly already quoting their URLs to workaround this issue, so "fixing" it will break their code by double-escaping the URLs. You've got to be more careful. Oh. yes, the change may break an already quoted URL. I think, I shall revert this back. |
|||
| msg165047 - (view) | Author: Senthil Kumaran (orsenthil) * (Python committer) | Date: 2012年07月08日 23:52 | |
On Sun, Jul 8, 2012 at 9:42 AM, Christian Heimes <report@bugs.python.org> wrote: > I vote for reverting the chances as they break the API. You could improve the docs and emphasize that URLs must be quoted correctly as the module doesn't implement browser magic. Okay. But I do realize that in 3.3, we may have a FancyURLOpener / URLOpener 's open method, which is not directly called by the apis, but they seem to have quote behavior. I guess, I approached this change as to making them consistent, but realize it is mistake, for the reasons that you state and Antoine state. |
|||
| msg165049 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2012年07月09日 00:59 | |
New changeset ebd37273e0fe by Senthil Kumaran in branch '3.2': revert the changes done for issue14826 - quoting witin Request is not desirable. http://hg.python.org/cpython/rev/ebd37273e0fe |
|||
| msg165050 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2012年07月09日 01:00 | |
New changeset a4bdb637d818 by Senthil Kumaran in branch 'default': revert the changes done for issue14826 - quoting witin Request is not desirable. http://hg.python.org/cpython/rev/a4bdb637d818 |
|||
| msg255525 - (view) | Author: Martin Panter (martin.panter) * (Python committer) | Date: 2015年11月28日 04:19 | |
FWIW urlopen() already handles space characters in the Location target of redirects; see HTTPRedirectHandler.redirect_request(). So I think it is reasonable to handle space characters in user-supplied URLs also, if it is done properly. |
|||
| msg268985 - (view) | Author: Martin Panter (martin.panter) * (Python committer) | Date: 2016年06月21日 12:36 | |
I think this should be treated as a feature, not a bug, since as Christian said, the documentation currently does not support this case. |
|||
| msg339836 - (view) | Author: Gregory P. Smith (gregory.p.smith) * (Python committer) | Date: 2019年04月10日 09:08 | |
urllib.request.URLopener() and FancyURLopener() automatically quote() URLs for the user. Those APIs are marked deprecated since 3.3 but have no timeline for removal. urllib.request.urlopen() does not use those, so URLs passed in are not auto-quoted. i'll clarify the docs for URLopener. |
|||
| msg339839 - (view) | Author: Gregory P. Smith (gregory.p.smith) * (Python committer) | Date: 2019年04月10日 09:17 | |
New changeset 2fb2bc81c3f40d73945c6102569495140e1182c7 by Gregory P. Smith in branch 'master': bpo-14826: document that URLopener quotes fullurl. (GH-12758) https://github.com/python/cpython/commit/2fb2bc81c3f40d73945c6102569495140e1182c7 |
|||
| msg339841 - (view) | Author: miss-islington (miss-islington) | Date: 2019年04月10日 09:30 | |
New changeset 9d2ccf173e2e8ff069153f603d2e5b1ea757e734 by Miss Islington (bot) in branch '3.7': bpo-14826: document that URLopener quotes fullurl. (GH-12758) https://github.com/python/cpython/commit/9d2ccf173e2e8ff069153f603d2e5b1ea757e734 |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:30 | admin | set | github: 59031 |
| 2019年04月10日 09:30:36 | miss-islington | set | nosy:
+ miss-islington messages: + msg339841 |
| 2019年04月10日 09:18:18 | miss-islington | set | pull_requests: + pull_request12690 |
| 2019年04月10日 09:18:04 | miss-islington | set | pull_requests: + pull_request12689 |
| 2019年04月10日 09:17:54 | gregory.p.smith | set | messages: + msg339839 |
| 2019年04月10日 09:08:05 | gregory.p.smith | set | nosy:
+ gregory.p.smith messages: + msg339836 |
| 2019年04月10日 09:07:09 | gregory.p.smith | set | pull_requests: + pull_request12687 |
| 2019年04月10日 00:39:59 | gregory.p.smith | set | stage: patch review pull_requests: + pull_request12682 |
| 2017年06月03日 05:54:20 | martin.panter | link | issue13359 superseder |
| 2016年06月21日 12:36:36 | martin.panter | set | type: enhancement title: urllib2.urlopen fails to load URL -> urlopen URL with unescaped space messages: + msg268985 versions: + Python 3.6, - Python 2.7, Python 3.2, Python 3.3 |
| 2015年11月28日 04:19:51 | martin.panter | set | nosy:
+ martin.panter messages: + msg255525 resolution: fixed -> stage: resolved -> (no value) |
| 2012年07月09日 01:00:08 | python-dev | set | messages: + msg165050 |
| 2012年07月09日 00:59:10 | python-dev | set | messages: + msg165049 |
| 2012年07月08日 23:52:05 | orsenthil | set | messages: + msg165047 |
| 2012年07月08日 23:49:33 | orsenthil | set | messages: + msg165046 |
| 2012年07月08日 16:42:19 | christian.heimes | set | nosy:
+ christian.heimes messages: + msg165023 |
| 2012年07月08日 09:30:31 | pitrou | set | nosy:
+ pitrou messages: + msg164982 |
| 2012年07月08日 09:25:14 | orsenthil | set | messages: + msg164981 |
| 2012年07月08日 09:22:19 | python-dev | set | messages: + msg164980 |
| 2012年07月08日 08:18:05 | orsenthil | set | messages: + msg164976 |
| 2012年07月08日 08:14:05 | eric.araujo | set | nosy:
+ eric.araujo messages: + msg164975 |
| 2012年07月08日 08:12:59 | orsenthil | set | messages: + msg164973 |
| 2012年07月08日 07:24:52 | jerub | set | files:
+ urllib-request.patch messages: + msg164969 |
| 2012年07月08日 06:17:30 | rosslagerwall | set | status: closed -> open nosy: + rosslagerwall messages: + msg164966 assignee: orsenthil |
| 2012年07月08日 00:38:38 | orsenthil | set | nosy:
+ orsenthil messages: + msg164958 |
| 2012年07月08日 00:38:10 | orsenthil | set | status: open -> closed stage: resolved resolution: fixed versions: + Python 2.7, Python 3.2 |
| 2012年07月08日 00:37:13 | python-dev | set | messages: + msg164957 |
| 2012年07月08日 00:15:07 | python-dev | set | nosy:
+ python-dev messages: + msg164955 |
| 2012年07月07日 21:55:02 | jerub | set | files:
+ urllib-quote-14826.patch versions: + Python 3.3, - Python 2.7 nosy: + jerub messages: + msg164945 keywords: + patch |
| 2012年05月16日 20:54:18 | antlong | set | nosy:
+ antlong messages: + msg160930 |
| 2012年05月16日 08:54:33 | wichert | create | |