This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2008年04月07日 20:55 by reacocard, last changed 2022年04月11日 14:56 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| httplib.patch | ajaksu2, 2008年04月07日 21:52 | Adds a socket buffer arg to HTTPResponse's __init__ | ||
| buffered_socket.diff | ajaksu2, 2008年04月09日 03:48 | review | ||
| Messages (16) | |||
|---|---|---|---|
| msg65121 - (view) | Author: Aren Olson (reacocard) | Date: 2008年04月07日 20:55 | |
This is a reposting of issue 508157, as requested by gvanrossum. The socket file object in httplib is opened without any buffering resulting in very slow performance of read(). The specific problem is in the httplib.HTTPResponse constructor. It calls sock.makefile() with a 0 for the buffer size. changing the buffer size to 4096 improved the time needed to download 10MB from 15.5s to 1.78s, almost 9x faster. Repeat downloads of the same file (meaning the server now has the file cached in memory), yield times of 15.5s and 0.03s, a 500x improvement. When fetching from a server on the local network, rather than from localhost, these times become 15.5s and 0.9s in both cases, a 17x speedup. Real-world situations will likely be a mix of these, however it is safe to say the speed improvement will be substantial. Adding an option to adjust the buffer size would be very welcome, though the default value should still be zero, to avoid the issues already mentioned in issue 508157. These speed results were obtained with python2.5 and apache2 under Ubuntu linux, using the code found here: http://pastebin.ca/973578 |
|||
| msg65127 - (view) | Author: Daniel Diniz (ajaksu2) * (Python triager) | Date: 2008年04月07日 21:52 | |
The code patch is trivial. I believe it needs docs (both explaining how to use and warning against the problems it may cause), a NEWS entry and tests (at least to check what happens when an invalid value lands). I can work on those changes if the general idea is not shot down :) |
|||
| msg65199 - (view) | Author: Facundo Batista (facundobatista) * (Python committer) | Date: 2008年04月08日 17:12 | |
Daniel, Aren, please submit also what Daniel described, and I'll take a look and push it forward. Regards, |
|||
| msg65232 - (view) | Author: Daniel Diniz (ajaksu2) * (Python triager) | Date: 2008年04月09日 03:48 | |
"The code patch is trivial", he said, only to find out it was not :) Facundo, thanks in advance for taking a look at this! This patch tries to implement, document and test an optional argument to HTTPConnection, which passes it to HTTPResponse. So far, it's called "sockbuf", but I truly hope we can find a better name. The crux of the test is that it shouldn't be possible to use a buffered socket and a persistent connection simultaneously, as that was the reason a buffer was left out (see issue 508157). It also tries to check correctly allowing Keep-Alive when sockbuf==0. However, it fails to exercise HTTPResponse properly and needs a review down the auto-reconnect path. Persistent connections should be tested. Regarding the code, there's a chance that some changes touching forced closing are bogus (but not harmful). I'll get back to it and to persistent connection tests. Thanks again :) |
|||
| msg65405 - (view) | Author: Daniel Diniz (ajaksu2) * (Python triager) | Date: 2008年04月12日 16:53 | |
Also reported in #1542407 |
|||
| msg91491 - (view) | Author: Chris Withers (cjw296) * (Python committer) | Date: 2009年08月12日 07:54 | |
I tried to use the following to change the buffersize for a download:
from base64 import encodestring
from httplib import HTTPResponse,HTTPConnection,HTTPSConnection,_UNKNOWN
from datetime import datetime
class FHTTPResponse(HTTPResponse):
def __init__(self, sock, debuglevel=0, strict=0, method=None):
print "creating response"
self.fp = sock.makefile('rb',4096)
self.debuglevel = debuglevel
self.strict = strict
self._method = method
self.msg = None
# from the Status-Line of the response
self.version = _UNKNOWN # HTTP-Version
self.status = _UNKNOWN # Status-Code
self.reason = _UNKNOWN # Reason-Phrase
self.chunked = _UNKNOWN # is "chunked" being used?
self.chunk_left = _UNKNOWN # bytes left to read in current
chunk
self.length = _UNKNOWN # number of bytes left in
response
self.will_close = _UNKNOWN # conn will close at end of
respons
class FHTTPConnection(HTTPConnection):
response_class = FHTTPResponse
class FHTTPSConnection(HTTPSConnection):
response_class = FHTTPResponse
conn = FHTTPSConnection('localhost')
headers = {}
auth = 'Basic '+encodestring('usernmae:password').strip()
headers['Authorization']=
t = datetime.now()
print t
conn.request('GET','/somefile.zip',None,headers)
print 'request:',datetime.now()-t
response = conn.getresponse()
print 'response:',datetime.now()-t
data = response.read()
print 'read:',datetime.now()-t
..however, I saw absolutely no change in download speed.
Aren, I notice in your pastebin code that you do response.read(10485700)
in a loop rather than just one response.read(), why is that?
|
|||
| msg91493 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2009年08月12日 10:30 | |
I must admit I don't understand the conflict between buffering and pipelined requests. This is all sequential reading and the buffer should be transparent, shouldn't it? |
|||
| msg91494 - (view) | Author: Chris Withers (cjw296) * (Python committer) | Date: 2009年08月12日 11:11 | |
Well, for me, buffer size doesn't appear to have made any difference... |
|||
| msg91594 - (view) | Author: Gregory P. Smith (gregory.p.smith) * (Python committer) | Date: 2009年08月15日 05:52 | |
Note that http://bugs.python.org/issue4879 may have already fixed this problem in trunk r68532. |
|||
| msg91595 - (view) | Author: Gregory P. Smith (gregory.p.smith) * (Python committer) | Date: 2009年08月15日 06:27 | |
Okay, I do not think this has been fixed yet. Anyone calling getresponse() can indeed use buffering=True, it can mess things up if the do not close the connection afterwards. The addition of the sockbuf parameter to HTTPConnection as proposed in buffered_socket.diff will work, but I'd follow the earlier work's lead. Don't expose it as the "sockbuf" integer. Just use a boolean "buffering" parameter that defaults to False. When true, do the same thing you do with sockbuf != 0. I see little value in actually being able to specify the exact buffer size used on the internal makefile on the HTTPConnection. The default will be sufficient. If you still want the user to be able to control it, perhaps add a HTTPConnection class attribute that defaults to -1 (socket.socket.makefile()'s bufsize default value) that you pass to makefile. Users can subclass HTTPConnection and give it a new value in that case. Personally I'd call that overkill. |
|||
| msg91599 - (view) | Author: Chris Withers (cjw296) * (Python committer) | Date: 2009年08月15日 09:15 | |
Why not allow True or an integer as values for a buffer_size parameter to the HTTPConnection constructor. False would be the default, which would mean "no buffering" as currently is the case. True would mean use buffering of the default size and an integer value would mean use buffering of that size? Out of interest, has any of the proposed patching from this issue, [issue4336] or [issue4879] been marged to the 2.6 branch? PS: As I said, for me, changing the buffer size made no difference, so I may have to open up a separate issue once I figure out what's going on... |
|||
| msg91617 - (view) | Author: Gregory P. Smith (gregory.p.smith) * (Python committer) | Date: 2009年08月15日 22:02 | |
Anything that adds a new parameter can not be backported to 2.6 as that counts as an API change / feature addition. |
|||
| msg91622 - (view) | Author: Gregory P. Smith (gregory.p.smith) * (Python committer) | Date: 2009年08月15日 22:41 | |
trunk r74463 now forces the HTTPResponse with buffering=True to close afterwards using a HTTPResponse._must_close flag similar to what was suggested in buffered_socket.diff in this issue. |
|||
| msg91625 - (view) | Author: Gregory P. Smith (gregory.p.smith) * (Python committer) | Date: 2009年08月15日 23:00 | |
I am also unable to reproduce the reported problem using the pastebin.ca/973578 code. The time to download 400mb from localhost remains the same regardless of buffering=False (default) or True. The problem still exists but it is better described in issue1542407 and should only effect the performance of reading the HTTP headers (a lot if you're writing an application doing small/medium RPC requests over HTTP). |
|||
| msg92246 - (view) | Author: Chris Withers (cjw296) * (Python committer) | Date: 2009年09月04日 11:18 | |
Yep, having done some more extensive profiling, it looks like my issue is different: all the time is being spent in httplib's HTTPResponse._read_chunked. That wouldn't be a symptom of this issue, would it? |
|||
| msg124057 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2010年12月15日 19:45 | |
This was apparently fixed in r69209. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:56:33 | admin | set | github: 46828 |
| 2010年12月15日 19:45:57 | pitrou | set | status: open -> closed messages: + msg124057 resolution: out of date nosy: facundobatista, gregory.p.smith, ggenellina, pitrou, ajaksu2, schmir, cjw296, reacocard |
| 2009年09月04日 11:18:38 | cjw296 | set | messages: + msg92246 |
| 2009年08月15日 23:00:45 | gregory.p.smith | set | priority: high -> normal messages: + msg91625 |
| 2009年08月15日 22:41:18 | gregory.p.smith | set | messages: + msg91622 |
| 2009年08月15日 22:02:20 | gregory.p.smith | set | messages: + msg91617 |
| 2009年08月15日 09:15:48 | cjw296 | set | messages: + msg91599 |
| 2009年08月15日 06:27:45 | gregory.p.smith | set | messages: + msg91595 |
| 2009年08月15日 05:52:05 | gregory.p.smith | set | nosy:
+ gregory.p.smith messages: + msg91594 |
| 2009年08月14日 07:52:17 | ggenellina | set | nosy:
+ ggenellina |
| 2009年08月12日 11:11:20 | cjw296 | set | messages: + msg91494 |
| 2009年08月12日 10:30:53 | pitrou | set | nosy:
+ pitrou messages: + msg91493 versions: + Python 3.1, Python 2.7, Python 3.2 |
| 2009年08月12日 07:54:23 | cjw296 | set | nosy:
+ cjw296 messages: + msg91491 |
| 2008年04月28日 20:05:29 | georg.brandl | link | issue1542407 superseder |
| 2008年04月16日 15:29:16 | schmir | set | nosy: + schmir |
| 2008年04月12日 16:53:54 | ajaksu2 | set | messages: + msg65405 |
| 2008年04月09日 03:48:47 | ajaksu2 | set | files:
+ buffered_socket.diff messages: + msg65232 |
| 2008年04月08日 17:12:49 | facundobatista | set | nosy:
+ facundobatista messages: + msg65199 |
| 2008年04月07日 21:52:56 | ajaksu2 | set | files:
+ httplib.patch keywords: + patch messages: + msg65127 nosy: + ajaksu2 |
| 2008年04月07日 20:57:15 | gregory.p.smith | set | priority: high |
| 2008年04月07日 20:55:10 | reacocard | create | |