homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: performance problem in socket._fileobject.read
Type: performance Stage:
Components: Library (Lib) Versions: Python 2.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: gregory.p.smith Nosy List: CurtHagenlocher, akuchling, barry, benjamin.peterson, doko, draghuram, forest, gregory.p.smith, pitrou, schmir
Priority: release blocker Keywords: patch

Created on 2008年04月14日 18:30 by CurtHagenlocher, last changed 2022年04月11日 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
socket.py.diff CurtHagenlocher, 2008年04月15日 05:24 Revised fix
socket-strio-gps01.patch.txt gregory.p.smith, 2008年04月21日 18:00 patch to avoid long lifetime of large strings by using StringIO [against trunk r62440]
Messages (26)
msg65472 - (view) Author: Curt Hagenlocher (CurtHagenlocher) Date: 2008年04月14日 18:30
First reported by Ralf Schmitt.
I haven't checked 2.6 or 3.0.
msg65473 - (view) Author: Raghuram Devarakonda (draghuram) (Python triager) Date: 2008年04月14日 18:33
The relevant python-dev thread is
http://mail.python.org/pipermail/python-dev/2008-April/078613.html 
msg65485 - (view) Author: Ralf Schmitt (schmir) Date: 2008年04月14日 21:49
One more time: the change is wrong. It should try to recv the maximum
not the minimum of size, buffer_size. If you using a buffering of 16
bytes it will otherwise call recv 256 times when you want to read 1024
bytes. this is wrong.
However there should be an upper limit, it doesn't make sense to read
10MB from socket in one recv call (imap bug).
msg65486 - (view) Author: Ralf Schmitt (schmir) Date: 2008年04月14日 21:51
akuchling, I added you to the nosy list. hope that's ok.
msg65487 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008年04月14日 22:10
This is apparently the same issue as #2601.
msg65489 - (view) Author: Ralf Schmitt (schmir) Date: 2008年04月14日 22:59
ahh. yes, same issue. should have taken a look at the bugtracker before,
would have saved me some time...
msg65499 - (view) Author: Curt Hagenlocher (CurtHagenlocher) Date: 2008年04月15日 05:24
I've attached a much better patch as suggested by Guido
msg65500 - (view) Author: Ralf Schmitt (schmir) Date: 2008年04月15日 05:42
This patch handles the case where the caller has specified the size
argument. When size is unspecified, it should be handled as if size was
infinite. By the formula from your patch, this should be
 recv_size = min(self.max_readsize, max(self._rbufsize, left))
 (== min(self.max_readsize, inf) == self.max_readsize)
and not the current code:
 if self._rbufsize <= 1:
 recv_size = self.default_bufsize
 else:
 recv_size = self._rbufsize
 while True:
 data = self._sock.recv(recv_size)
BTW, I still think this max_readsize limit should be handled somewhere
deeper in the code. at least maybe in the _socketobject class.
msg65528 - (view) Author: Curt Hagenlocher (CurtHagenlocher) Date: 2008年04月15日 20:45
At least in 2.5.2, there appear to be only two places in the standard 
library where an arbitrary and unchecked size is passed directly to 
_socketobject.recv. _fileobject.read is one such place and the other 
is in logging/config.py. In both cases, the pattern is something like
while len(data) < size:
 data = data + sock.recv(size - len(data))
Of course, the same pattern may exist in any number of instances of 
user code that performs a socket.recv, and all of these would be 
subject to the same memory thrashing problem that originally beset 
imaplib.
As such, I now agree with Ralf that the fix ultimately belongs in 
_socketobject. However, deploying a fix for 2.5.2 is probably easier 
if it's in socket.py than if it requires a recompile.
msg65561 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008年04月16日 18:27
As A.M. Kuchling said in the other bug :), there is no need to fix 2.5.2
as the offending change is posterior to the release.
By the way, somewhat really ought to close either this bug or #2601,
discussing things in parallel in two different bug entries is too confusing.
msg65607 - (view) Author: Ralf Schmitt (schmir) Date: 2008年04月18日 09:26
However it should be fixed in release25-maint. Can anyone install curt's
patch? It does what the original fix intended to do.
msg65664 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008年04月21日 18:00
Twisted fixed their problem for issue 1092502 by making recv()ed data
short lived by putting it into a StringIO for buffering.
I've attached a patch that does that for the socket module -and- gets
rid of any possibility of doing tiny size recvs in read() which is the
cause of the poor performance that this issue was filed about.
My patch also documents what _rbufsize is used for (readline) and makes
sane use of it with read.
When viewing the change, it is best to view it side by side next to the
old version rather than trying to comprehend it as a diff.
I believe this really fixes both issue 1092502 while at the same time
prevents introduction of a speed problem (this issue).
msg65669 - (view) Author: Ralf Schmitt (schmir) Date: 2008年04月22日 06:02
(I think the twisted issue your talking about is:
http://twistedmatrix.com/trac/ticket/1079)
Your patch still contains this code:
 recv_size = min(rbufsize, left)
 data = self._sock.recv(recv_size)
This is IMHO wrong. See my comments above.
msg65756 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008年04月25日 00:41
available for an easy side by side review here:
 http://codereview.appspot.com/212
Also, yes I think you're right Ralf. With these changes I should be
able to return that to a max() within the while True: for sized reads
and things will work as desired.
msg66098 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008年05月02日 16:40
I fixed the min vs max use that Ralf noted and have submitted this as
r62627. all tests pass for me but I believe it deserves a wider
audience and testing outside of just the test suite.
could those who reported the original problems (both the memory use
issue and the speed issue) run their tests with this applied?
... watching the buildbots ...
 http://codereview.appspot.com/621/diff/61/81 
msg66267 - (view) Author: Ralf Schmitt (schmir) Date: 2008年05月05日 12:47
I have a case which triggers the assert in readline:
 File "build/bdist.linux-x86_64/egg/setuptools/command/develop.py",
line 27, in run
 File "build/bdist.linux-x86_64/egg/setuptools/command/develop.py",
line 102, in install_for_development
 File
"build/bdist.linux-x86_64/egg/setuptools/command/easy_install.py", line
519, in process_distribution
 File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 522, in resolve
 File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 758, in
best_match
 File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 770, in obtain
 File
"build/bdist.linux-x86_64/egg/setuptools/command/easy_install.py", line
433, in easy_install
 File "build/bdist.linux-x86_64/egg/setuptools/package_index.py", line
462, in fetch_distribution
 File "build/bdist.linux-x86_64/egg/setuptools/package_index.py", line
303, in find_packages
 File "build/bdist.linux-x86_64/egg/setuptools/package_index.py", line
617, in scan_url
 File "build/bdist.linux-x86_64/egg/setuptools/package_index.py", line
198, in process_url
 File "/home/ralf/py26/lib/python2.6/socket.py", line 326, in read
data = self._sock.recv(rbufsize)
 File "/home/ralf/py26/lib/python2.6/httplib.py", line 512, in read
return self._read_chunked(amt)
 File "/home/ralf/py26/lib/python2.6/httplib.py", line 548, in
_read_chunked
line = self.fp.readline()
 File "/home/ralf/py26/lib/python2.6/socket.py", line 391, in readline
assert buf.tell() == 0
AssertionError
this happens when I run :
easy_install 'mwlib<0.6'
msg66269 - (view) Author: Raghuram Devarakonda (draghuram) (Python triager) Date: 2008年05月05日 13:59
Similar problem is reported in #2760.
msg66278 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008年05月05日 19:20
I'm glad i put that assert in... The problem occurs due to a mixture of
fixed size reads followed by unbounded readlines on an unbuffered
_fileobject. A case that the code currently doesn't handle. I'm fixing it.
msg66294 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008年05月05日 21:54
The bug introduced in r62627 has been fixed in r62744.
msg66306 - (view) Author: Ralf Schmitt (schmir) Date: 2008年05月06日 08:23
the above easy_install now works for me. thanks.
msg66400 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2008年05月08日 03:46
Bumping this down to critical for the alpha release. It should be
release blocker for the first beta.
msg66984 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008年05月17日 07:14
I'm going to let the committed change bake in Python trunk for a beta
release or two before backporting it to release25-maint.
msg67945 - (view) Author: Ralf Schmitt (schmir) Date: 2008年06月11日 09:36
Can we get the fix for release25-maint? It will not get worse than the
current state is.
msg68191 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008年06月13日 23:46
If it's fixed in 2.6 and 3.0, it shouldn't be release blocker anymore,
now should it?
msg68192 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2008年06月13日 23:46
Well, it's a release blocker for 2.5.3.
msg69315 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008年07月06日 04:04
committed to release25-maint in r64754.
History
Date User Action Args
2022年04月11日 14:56:33adminsetgithub: 46884
2008年07月06日 04:04:51gregory.p.smithsetstatus: open -> closed
resolution: fixed
messages: + msg69315
2008年06月13日 23:46:55benjamin.petersonsetnosy: + benjamin.peterson
messages: + msg68192
2008年06月13日 23:46:03pitrousetmessages: + msg68191
2008年06月11日 09:36:27schmirsetmessages: + msg67945
2008年06月11日 05:44:29gregory.p.smithsetversions: - Python 2.6
2008年05月17日 07:14:54gregory.p.smithsetmessages: + msg66984
2008年05月14日 22:42:11georg.brandlsetpriority: critical -> release blocker
2008年05月08日 03:46:36barrysetpriority: release blocker -> critical
nosy: + barry
messages: + msg66400
2008年05月06日 08:23:31schmirsetmessages: + msg66306
2008年05月05日 21:54:34gregory.p.smithsetmessages: + msg66294
2008年05月05日 19:20:59gregory.p.smithsetmessages: + msg66278
2008年05月05日 18:35:43gregory.p.smithlinkissue2760 superseder
2008年05月05日 18:35:43gregory.p.smithlinkissue2760 dependencies
2008年05月05日 13:59:50draghuramsetmessages: + msg66269
2008年05月05日 12:47:27schmirsetmessages: + msg66267
2008年05月02日 17:08:13gregory.p.smithsetpriority: critical -> release blocker
2008年05月02日 16:40:05gregory.p.smithsetmessages: + msg66098
2008年04月30日 08:56:36dokosetnosy: + doko
2008年04月30日 05:56:00mhammondlinkissue2601 superseder
2008年04月25日 00:41:13gregory.p.smithsetmessages: + msg65756
2008年04月22日 06:02:36schmirsetmessages: + msg65669
2008年04月21日 22:32:55forestsetnosy: + forest
2008年04月21日 18:00:53gregory.p.smithsetfiles: + socket-strio-gps01.patch.txt
assignee: gregory.p.smith
messages: + msg65664
nosy: + gregory.p.smith
2008年04月18日 09:26:23schmirsetmessages: + msg65607
2008年04月16日 18:27:57pitrousetmessages: + msg65561
2008年04月15日 20:45:15CurtHagenlochersetmessages: + msg65528
title: socket._fileobject.read(n) should ignore _rbufsize when 1 -> performance problem in socket._fileobject.read
2008年04月15日 05:42:16schmirsetmessages: + msg65500
2008年04月15日 05:24:32CurtHagenlochersetfiles: - socket.py.diff
2008年04月15日 05:24:23CurtHagenlochersetfiles: + socket.py.diff
messages: + msg65499
2008年04月14日 22:59:39schmirsetmessages: + msg65489
2008年04月14日 22:10:24pitrousetnosy: + pitrou
messages: + msg65487
2008年04月14日 21:52:18schmirsetnosy: + akuchling
2008年04月14日 21:51:30schmirsetmessages: + msg65486
2008年04月14日 21:51:03gvanrossumsetpriority: critical
2008年04月14日 21:50:00schmirsetversions: + Python 2.6
2008年04月14日 21:49:44schmirsetmessages: + msg65485
2008年04月14日 21:42:00schmirsetnosy: + schmir
2008年04月14日 18:33:25draghuramsetnosy: + draghuram
messages: + msg65473
2008年04月14日 18:30:16CurtHagenlochercreate

AltStyle によって変換されたページ (->オリジナル) /