This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2008年12月11日 11:23 by dato, last changed 2022年04月11日 14:56 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| urllib_bytes.diff | ajaksu2, 2008年12月14日 12:17 | Makes urlib.request.urlopen.HTTPHandler use HTTP 1.0 again | ||
| urllib-chunked.diff | jhylton, 2008年12月15日 19:15 | |||
| test_urllib_chunked2.diff | ajaksu2, 2009年02月09日 00:24 | Tests "Transfer-Encoding: chunked" handling in urllib | ||
| urllib-chunked2.diff | pitrou, 2009年02月10日 23:03 | |||
| Messages (23) | |||
|---|---|---|---|
| msg77603 - (view) | Author: Adeodato Simó (dato) | Date: 2008年12月11日 11:23 | |
This is very odd, but it was reproduced by people in #python as well.
Compare, in python 2.5:
>>>
urllib.urlopen('http://bugs.debian.org/cgi-bin/bugreport.cgi?mbox=yes;bug=123456').readline()
'From mechanix@lucretia.debian.net Tue Dec 11 11:32:47 2001\n'
To the equivalent in python 3.0:
>>>
urllib.request.urlopen('http://bugs.debian.org/cgi-bin/bugreport.cgi?mbox=yes;bug=123456').readline()
b'f65\r\n'
|
|||
| msg77605 - (view) | Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) | Date: 2008年12月11日 12:12 | |
I don't reproduce the problem:
>>>
urllib.request.urlopen('http://bugs.debian.org/cgi-bin/bugreport.cgi?mbox=yes;bug=123456').readline()
b'From mechanix@lucretia.debian.net Tue Dec 11 11:32:47 2001\n'
I connect through a http proxy.
|
|||
| msg77611 - (view) | Author: Daniel Diniz (ajaksu2) * (Python triager) | Date: 2008年12月11日 13:13 | |
Confirmed:
Python 3.1a0 (py3k:67702, Dec 11 2008, 11:09:14)
[GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib.request
>>>
urllib.request.urlopen('http://bugs.debian.org/cgi-bin/bugreport.cgi?mbox=yes;bug=123456').readlines()
[b'f65\r\n', b'From mechanix@lucretia.debian.net Tue Dec 11 11:32:47
2001\n', ...
Perhaps it's related to the \r at read boundaries bug?
|
|||
| msg77612 - (view) | Author: Jean-Paul Calderone (exarkun) * (Python committer) | Date: 2008年12月11日 13:53 | |
The "f65" is the chunk length for the first chunk returned when requesting that URL. A proxy could easily hide this by switching to a different transfer encoding. |
|||
| msg77775 - (view) | Author: Jeremy Hylton (jhylton) (Python triager) | Date: 2008年12月14日 02:54 | |
Does the same thing happen with 2.6? Jeremy On Thu, Dec 11, 2008 at 8:53 AM, Jean-Paul Calderone <report@bugs.python.org> wrote: > > Jean-Paul Calderone <exarkun@divmod.com> added the comment: > > The "f65" is the chunk length for the first chunk returned when > requesting that URL. A proxy could easily hide this by switching to a > different transfer encoding. > > ---------- > nosy: +exarkun > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue4631> > _______________________________________ > _______________________________________________ > Python-bugs-list mailing list > Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/jeremy%40alum.mit.edu > > |
|||
| msg77778 - (view) | Author: Adeodato Simó (dato) | Date: 2008年12月14日 10:02 | |
> Does the same thing happen with 2.6? No, I can't reproduce with 2.6.1. |
|||
| msg77779 - (view) | Author: Daniel Diniz (ajaksu2) * (Python triager) | Date: 2008年12月14日 10:08 | |
Jeremy: no, it doesn't.
Python 2.6.1+ (release26-maint:67716M, Dec 13 2008, 10:30:52)
[GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2
~/release26-maint$ ./python -c "import urllib; print
urllib.urlopen('http://bugs.debian.org/cgi-bin/bugreport.cgi?mbox=yes;bug=123456').readlines()[0]"
From mechanix@lucretia.debian.net Tue Dec 11 11:32:47 2001
~/release26-maint$ ./python -c "from __future__ import unicode_literals;
import urllib; print
urllib.urlopen('http://bugs.debian.org/cgi-bin/bugreport.cgi?mbox=yes;bug=123456').readlines()[0]"
From mechanix@lucretia.debian.net Tue Dec 11 11:32:47 2001
FWIW, there are trailing spurious bytes too (note read() gives bytes,
while readlines() both bytes and strings in 3.0):
>>> import urllib.request; content =
urllib.request.urlopen('http://bugs.debian.org/cgi-bin/bugreport.cgi?mbox=yes;bug=123456').read()
Python 3.1a0 (py3k:67702, Dec 11 2008, 11:09:14)
[GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib.request
>>> content =
urllib.request.urlopen('http://bugs.debian.org/cgi-bin/bugreport.cgi?mbox=yes;bug=123456').read()
>>> content[-30:]
b'PGP SIGNATURE-----\n\n\n\n\n\r\n0\r\n\r\n'
>>> content[:10]
b'f65\r\nFrom '
While in 2.6:
>>> import urllib
>>> content =
urllib.urlopen('http://bugs.debian.org/cgi-bin/bugreport.cgi?mbox=yes;bug=123456').read()
>>> content[-30:]
'---END PGP SIGNATURE-----\n\n\n\n\n'
|
|||
| msg77780 - (view) | Author: Adeodato Simó (dato) | Date: 2008年12月14日 10:11 | |
> FWIW, there are trailing spurious bytes too And in the middle of the document as well. Each time there's a chunk, I guess? |
|||
| msg77781 - (view) | Author: Resul Cetin (ResulCetin) | Date: 2008年12月14日 10:20 | |
I have the same problem with that code:
(exchange USERNAME with your delicious username and PASSWORD with your
delicious password):
import urllib.request
auth_handler = urllib.request.HTTPBasicAuthHandler()
auth_handler.add_password('del.icio.us API', 'api.del.icio.us',
USERNAME, PASSWORD)
opener = urllib.request.build_opener(auth_handler)
print(str(opener.open('https://api.del.icio.us/v1/posts/all').read(20),
"utf-8"))
And I don't use a proxy or anything like that. This makes python 3
completely unusable for me. And python 2.6 gives me what I want (the
content of that virtual file) without any extra data in front or in the
middle of the content.
|
|||
| msg77789 - (view) | Author: Daniel Diniz (ajaksu2) * (Python triager) | Date: 2008年12月14日 11:31 | |
Took me a bit of Wiresharking to find this out, the problem is that we are asking for the page using HTTP 1.1 in 3.0. Here's a workaround patch for those who need it quick, I have yet to look at urllib to see what can be fixed there. --- Index: Lib/http/client.py =================================================================== --- Lib/http/client.py (revision 67716) +++ Lib/http/client.py (working copy) @@ -600,7 +600,7 @@ class HTTPConnection: _http_vsn = 11 - _http_vsn_str = 'HTTP/1.1' + _http_vsn_str = 'HTTP/1.0' response_class = HTTPResponse default_port = HTTP_PORT --- This is what we send in 2.5 and 3.0: GET /cgi-bin/bugreport.cgi?mbox=yes;bug=123456 HTTP/1.0 User-Agent: Python-urllib/1.17 GET /cgi-bin/bugreport.cgi?mbox=yes;bug=123456 HTTP/1.1 Accept-Encoding: identity User-Agent: Python-urllib/3.1 |
|||
| msg77794 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2008年12月14日 12:12 | |
IMO we should downgrade urlopen to HTTP 1.0 in 3.0.1. Implementing chunked encoding will come later if people are interested in doing it. |
|||
| msg77796 - (view) | Author: Daniel Diniz (ajaksu2) * (Python triager) | Date: 2008年12月14日 12:17 | |
Clarifying the diagnosis, the offending spurious bytes are only present when we use 3.0's GET above. That's because urllib.request.HTTPHandler asks for a vanilla http.client.HTTPConnection, which uses HTTP 1.1. IIUC, either we change the request version back to 1.0 (attached patch) or correct the way the response is processed (is it at all?). I think HTTPSHandler will also suffer from this, perhaps [Fancy]URLopener too. [Antoine: cool, an edit conflict that agrees with what I was about to post :D] |
|||
| msg77846 - (view) | Author: Jeremy Hylton (jhylton) (Python triager) | Date: 2008年12月15日 04:01 | |
Brief update: The Python 2.x code works because readline() is provided by socket._fileobject. The Python 3.x code fails because it grabs the HTTPResponse.fp instance variable at the end of AbstractHTTPHandler.do_open. That method needs to pass the response to addinfourl(), but needs to have support for readline / readlines before it can do that. |
|||
| msg77879 - (view) | Author: Jeremy Hylton (jhylton) (Python triager) | Date: 2008年12月15日 19:15 | |
I have a patch here that seems to work for the specific url and that passes all the tests. Can anyone check whether it works for a larger set of cases? I'm a little concerned because I don't understand the new io library in much detail. There's an override for _checkClosed() in the HTTPResponse that seems a little dodgy. I'll try to get someone to review that specifically. |
|||
| msg77889 - (view) | Author: Daniel Diniz (ajaksu2) * (Python triager) | Date: 2008年12月15日 23:00 | |
I think your patch is good, but there may be another bug around: I wrote a script to check results of 3.x against 2.x, but many pages (http://groups.google.com/, http://en.wikipedia.org/) give 403: Forbidden for 3.x... but work with 2.x! If you think of this as a bug in 3.x, it could retry the request identifying as 2.x on 403. Other than that, your patch gives me identical results to 2.5/2.6 for 128 sites I tested (only a read(100) for each). Interestingly, my patched version gives a file closer to the buggy version in size, at 12700 bytes versus 12707. Your version agrees with 2.x and simple maths (128 x 100) in giving a 12799 bytes result. I have no idea why. HTH, Daniel |
|||
| msg78144 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2008年12月21日 13:24 | |
The patch should have at least a test so that we don't have a regression on this one. |
|||
| msg81360 - (view) | Author: Daniel Diniz (ajaksu2) * (Python triager) | Date: 2009年02月08日 01:17 | |
Here's a test (in test_urllib2_localnet) that fails before the patch and
passes after, mostly lifted from test_httplib:
def test_chunked(self):
expected_response = b"hello world"
chunked_start = (
b'a\r\n'
b'hello worl\r\n'
b'1\r\n'
b'd\r\n'
)
response = [(200, [("Transfer-Encoding", "chunked")],
chunked_start)]
handler = self.start_server(response)
data = self.urlopen("http://localhost:%s/" % handler.port)
self.assertEquals(data, expected_response)
Output:
test test_urllib2_localnet failed -- Traceback (most recent call last):
File "~/py3k/Lib/test/test_urllib2_localnet.py", line 390, in test_chunked
self.assertEquals(data, expected_response)
AssertionError: b'a\r\nhello worl\r\n1\r\nd\r\n' != b'hello world'
To allow this test to work, the attached patch also touches
FakeHTTPRequestHandler and TestUrlopen.urlopen.
|
|||
| msg81385 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2009年02月08日 15:58 | |
On the principle, the test looks good. If you want to avoid the 'if "%" in value' hack, you can use the named-parameter form of string formatting: >>> "localhost:%(port)s" % dict(port=8080) 'localhost:8080' >>> "localhost" % dict(port=8080) 'localhost' |
|||
| msg81428 - (view) | Author: Daniel Diniz (ajaksu2) * (Python triager) | Date: 2009年02月09日 00:24 | |
Antoine, Thanks for reviewing, here's an updated version. |
|||
| msg81433 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2009年02月09日 02:23 | |
The test looks good to me. I can't comment on the bugfix patch, but if it's ok to you, you can go ahead :) |
|||
| msg81610 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2009年02月10日 22:31 | |
I took a look at the patch and it looks ok, apart from the _checkClosed() hack (but I don't think there's any immediate solution). It should be noted that HTTPResponse.readline() will be awfully slow since, as HTTPResponse doesn't have peek(), readline() will call read() one byte at a time... (slow I/O is nothing new in py3k, however :-)) |
|||
| msg81611 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2009年02月10日 23:03 | |
Here is a patch without the _checkClosed() hack. The solution is simply to remove redundant _checkClosed() calls in IOBase (for example, readline() doesn't need to do an explicit `closed` check as it calls read()). |
|||
| msg81617 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2009年02月11日 01:02 | |
Committed in r69513, r69514. Thanks everyone! |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:56:42 | admin | set | github: 48881 |
| 2009年02月11日 01:02:33 | pitrou | set | status: pending -> closed resolution: accepted -> fixed messages: + msg81617 |
| 2009年02月11日 00:39:28 | pitrou | set | status: open -> pending resolution: accepted |
| 2009年02月10日 23:03:48 | pitrou | set | files:
+ urllib-chunked2.diff messages: + msg81611 |
| 2009年02月10日 22:31:04 | pitrou | set | messages: + msg81610 |
| 2009年02月09日 02:23:13 | pitrou | set | messages: + msg81433 |
| 2009年02月09日 00:24:38 | ajaksu2 | set | files: - test_urllib_chunked.diff |
| 2009年02月09日 00:24:23 | ajaksu2 | set | files:
+ test_urllib_chunked2.diff messages: + msg81428 |
| 2009年02月08日 15:58:59 | pitrou | set | messages: + msg81385 |
| 2009年02月08日 01:17:58 | ajaksu2 | set | files:
+ test_urllib_chunked.diff messages: + msg81360 |
| 2009年01月28日 19:19:09 | exarkun | set | nosy: - exarkun |
| 2009年01月28日 18:57:18 | trodgers | set | nosy: + trodgers |
| 2009年01月10日 04:12:55 | craigh | set | nosy: + craigh |
| 2008年12月21日 13:24:51 | pitrou | set | messages: + msg78144 |
| 2008年12月21日 09:43:26 | loewis | set | priority: critical -> release blocker |
| 2008年12月15日 23:00:12 | ajaksu2 | set | messages: + msg77889 |
| 2008年12月15日 19:15:21 | jhylton | set | files:
+ urllib-chunked.diff messages: + msg77879 |
| 2008年12月15日 04:01:47 | jhylton | set | messages: + msg77846 |
| 2008年12月15日 03:59:44 | jhylton | link | issue3761 superseder |
| 2008年12月14日 22:00:56 | jhylton | set | assignee: jhylton |
| 2008年12月14日 12:17:17 | ajaksu2 | set | files:
+ urllib_bytes.diff keywords: + patch messages: + msg77796 |
| 2008年12月14日 12:12:23 | pitrou | set | nosy:
+ pitrou messages: + msg77794 |
| 2008年12月14日 12:05:21 | pitrou | set | priority: critical type: behavior |
| 2008年12月14日 11:31:53 | ajaksu2 | set | messages: + msg77789 |
| 2008年12月14日 10:20:45 | ResulCetin | set | nosy:
+ ResulCetin messages: + msg77781 |
| 2008年12月14日 10:11:42 | dato | set | messages: + msg77780 |
| 2008年12月14日 10:08:12 | ajaksu2 | set | messages: + msg77779 |
| 2008年12月14日 10:02:43 | dato | set | messages: + msg77778 |
| 2008年12月14日 02:54:55 | jhylton | set | nosy:
+ jhylton messages: + msg77775 |
| 2008年12月11日 13:53:58 | exarkun | set | nosy:
+ exarkun messages: + msg77612 |
| 2008年12月11日 13:13:23 | ajaksu2 | set | nosy:
+ ajaksu2 messages: + msg77611 |
| 2008年12月11日 12:12:45 | amaury.forgeotdarc | set | nosy:
+ amaury.forgeotdarc messages: + msg77605 |
| 2008年12月11日 11:23:01 | dato | create | |