This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2005年05月25日 09:20 by manekcz, last changed 2022年04月11日 14:56 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| urllib2leak.py | stephbul, 2009年06月03日 13:31 | main test | ||
| urllib2.py | peci, 2009年09月04日 10:17 | |||
| Messages (21) | |||
|---|---|---|---|
| msg60743 - (view) | Author: Petr Toman (manekcz) | Date: 2005年05月25日 09:20 | |
It seems that the urlopen(url) methd of the urllib2 module
leaves some undestroyable objects in memory.
Please try the following code:
==========================
if __name__ == '__main__':
import urllib2
a = urllib2.urlopen('http://www.google.com')
del a # or a = None or del(a)
# check memory on memory leaks
import gc
gc.set_debug(gc.DEBUG_SAVEALL)
gc.collect()
for it in gc.garbage:
print it
==========================
In our code, we're using lots of urlopens in a loop and
the number of unreachable objects grows beyond all
limits :) We also tried a.close() but it didn't help.
You can also try the following:
==========================
def print_unreachable_len():
# check memory on memory leaks
import gc
gc.set_debug(gc.DEBUG_SAVEALL)
gc.collect()
unreachableL = []
for it in gc.garbage:
unreachableL.append(it)
return len(str(unreachableL))
if __name__ == '__main__':
print "at the beginning", print_unreachable_len()
import urllib2
print "after import of urllib2", print_unreachable_len()
a = urllib2.urlopen('http://www.google.com')
print 'after urllib2.urlopen', print_unreachable_len()
del a
print 'after del', print_unreachable_len()
==========================
We're using WindowsXP with latest patches, Python 2.4
(ActivePython 2.4 Build 243 (ActiveState Corp.) based on
Python 2.4 (#60, Nov 30 2004, 09:34:21) [MSC v.1310
32 bit (Intel)] on win32).
|
|||
| msg60744 - (view) | Author: A.M. Kuchling (akuchling) * (Python committer) | Date: 2005年06月01日 23:13 | |
Logged In: YES user_id=11375 Confirmed. The objects involved seem to be an HTTPResponse and the socket._fileobject wrapper; the assignment 'r.recv=r.read' around line 1013 of urllib2.py seems to be critical to creating the cycle. |
|||
| msg60745 - (view) | Author: Sean Reifschneider (jafo) * (Python committer) | Date: 2005年06月29日 03:27 | |
Logged In: YES user_id=81797 I can reproduce this in both the python.org 2.4 RPM and in a freshly built copy from CVS. If I run a few thousand urlopen()s, I get: Traceback (most recent call last): File "/tmp/mt", line 26, in ? File "/tmp/python/dist/src/Lib/urllib2.py", line 130, in urlopen File "/tmp/python/dist/src/Lib/urllib2.py", line 361, in open File "/tmp/python/dist/src/Lib/urllib2.py", line 379, in _open File "/tmp/python/dist/src/Lib/urllib2.py", line 340, in _call_chain File "/tmp/python/dist/src/Lib/urllib2.py", line 1026, in http_open File "/tmp/python/dist/src/Lib/urllib2.py", line 1001, in do_open urllib2.URLError: <urlopen error (24, 'Too many open files')> Even if I do a a.close(). I'll investigate a bit further. Sean |
|||
| msg60746 - (view) | Author: Sean Reifschneider (jafo) * (Python committer) | Date: 2005年06月29日 03:52 | |
Logged In: YES user_id=81797 I give up, this code is kind of a maze of twisty little passages. I did try doing "a.fp.close()" and that didn't seem to help at all. Couldn't really make any progress on that though. I also tried doing a "if a.headers.fp: a.headers.fp.close()", which didn't do anything noticable. |
|||
| msg60747 - (view) | Author: Brian Wellington (bwelling) | Date: 2005年08月12日 02:22 | |
Logged In: YES
user_id=63197
We just ran into this same problem, and worked around it by
simply removing the 'r.recv = r.read' line in urllib2.py,
and creating a recv alias to the read function in
HTTPResponse ('recv = read' in the class).
Not sure if this is the best solution, but it seems to work.
|
|||
| msg60748 - (view) | Author: Sean Reifschneider (jafo) * (Python committer) | Date: 2005年08月12日 22:30 | |
Logged In: YES user_id=81797 I've just tried it again using the current CVS version as well as the version installed with Fedora Core 4, and in both cases I was able to run over 100,000 retrievals of http://127.0.0.1/test.html and http://127.0.0.1/google.html. test.html is just "it works" and google.html was generated with "wget -O google.html http://google.com/". I was able to reproduce this before, but now am not. My urllib2.py includes the r.recv=r.read line. I have upgraded from FC3 to FC4, could this be something related to an OS or library interaction? I was going to try to confirm the last message, but now I can't reproduce the failure. |
|||
| msg60749 - (view) | Author: Brian Wellington (bwelling) | Date: 2005年08月15日 18:13 | |
Logged In: YES
user_id=63197
The real problem we were seeing wasn't the memory leak, it
was a file descriptor leak. Leaking references within the
interpreter is bad, but the garbage collector will
eventually notice that the system is out of memory and clean
them. Leaking file descriptors is much worse, as gc won't
be triggered when the process has reached it's limit, and
the process will start failing with "Too many file descriptors".
To easily show this problem, run the following from an
interactive python interpreter:
import urllib2
f = urllib2.urlopen('http://www.google.com')
f.close()
and from another window, run "lsof -p <pid of interpreter>".
It should show a TCP socket in CLOSE_WAIT, which means the
file descriptor is still open. I'm seeing weirdness on
Fedora Core 4 today that I didn't see last week where after
a few seconds, the file descriptor is listed as "can't
identify protocol" instead of TCP, but that's not too
relevant, since it's still open.
Repeating the urllib2.urlopen()/close() pairs of statements
in the interpreter will cause more fds to be leaked, which
can also be seen by lsof.
|
|||
| msg60750 - (view) | Author: Steve Holden (holdenweb) * (Python committer) | Date: 2005年10月14日 04:13 | |
Logged In: YES user_id=88157 The Windows 2.4.1 build doesn't show this error, but the Cygwin 2.4.1 build does still have uncollectable objects after a urllib2.urlopen(), so there may be a platform dependency here. No 2.4.2 on Cygwin yet, so nothing conclusive as lsof isn't available. |
|||
| msg60751 - (view) | Author: Neil Swinton (nswinton) | Date: 2005年10月18日 15:00 | |
Logged In: YES
user_id=1363935
It's not the prettiest thing, but you can work around this
by setting the socket's recv method to None before closing it.
import urllib2
f = urllib2.urlopen('http://www.google.com')
text=f.read()
f.fp._sock.recv=None # hacky avoidance
f.close()
|
|||
| msg76298 - (view) | Author: Toshio Kuratomi (a.badger) * | Date: 2008年11月24日 05:20 | |
I tried to repeat the test in http://bugs.python.org/msg60749 and found that the descriptors will close if you read from the file before closing. so this leads to open descriptors:: import urllib2 f = urllib2.urlopen('http://www.google.com') f.close() while this does not:: import urllib2 f = urllib2.urlopen('http://www.google.com') f.read(1) f.close() |
|||
| msg76300 - (view) | Author: Toshio Kuratomi (a.badger) * | Date: 2008年11月24日 05:47 | |
One further data point. On two rhel5 systems with identical kernels, both x86_64, both python-2.4.3... basically, everything I've thought to check identical, I ran the test code with f.read() in an infinite loop. One system only has one TCP socket in use at a time. The other one has multiple TCP sockets in use, but they all close eventually. /usr/sbin/lsof -p INTERPRETER_PID|wc -l reported 96 67 97 63 91 62 94 78 on subsequent runs. |
|||
| msg76350 - (view) | Author: Jeremy Hylton (jhylton) (Python triager) | Date: 2008年11月24日 18:03 | |
Python 2.4 is now in security-fix-only mode. No new features are being added, and bugs are not fixed anymore unless they affect the stability and security of the interpreter, or of Python applications. http://www.python.org/download/releases/2.4.5/ This bug doesn't rise to the level of making into a 2.4.6. |
|||
| msg76368 - (view) | Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) | Date: 2008年11月24日 22:22 | |
Reopening: I reproduce the problem consistently with both 2.6 and trunk versions (not with python 3.0), on Windows XP. |
|||
| msg76683 - (view) | Author: Senthil Kumaran (orsenthil) * (Python committer) | Date: 2008年12月01日 10:40 | |
> Amaury Forgeot d'Arc <amauryfa@gmail.com> added the comment: > > Reopening: I reproduce the problem consistently with both 2.6 and trunk > versions (not with python 3.0), on Windows XP. > I think this bug is ONLY with respect to Windows Systems. I not able to reproduce this on the current trunk on Linux Ubuntu ( 8.04). I tried 100 and 1000 instances of open and close and everytime file descriptors goes through ESTABLISHED, SYNC_SENT and closes for TCP connections. And yeah, certain instances showed 'can't identify protocol' randomly. But thats a different issue. The original bug raised for Python 2.4 was originally raised on Linux and it seems to have been fixed. A Windows expert should comment on this, if this is consistently reproducable on Windows. |
|||
| msg88812 - (view) | Author: BULOT (stephbul) | Date: 2009年06月03日 13:31 | |
Hello, I'm facing a urllib2 memory leak issue in one of my scripts that is not threaded. I made a few tests in order to check what was going on and I found this already existing bug thread (but old). I'm not able to figure out what is the issue yet, but here are a few informations: Platform: Debian Python version 2.5.4 I made a script (2 attached files) in order to make access to a web page (http://www.google.com) every second, that monitors number of file descriptors and memory footprint. I also introduced the gc module (Garbage Collector) in order to retrieve numbers of objects that are not freed (like already proposed in this thread but more focussed on gc.DEBUG_LEAK flag) Here are my results: First acces output: gc: collectable <dict 0xb793c604> gc: collectable <HTTPResponse instance at 0xb7938f6c> gc: collectable <dict 0xb793c4f4> gc: collectable <HTTPMessage instance at 0xb793d0ec> gc: collectable <dict 0xb793c02c> gc: collectable <list 0xb7938e8c> gc: collectable <list 0xb7938ecc> gc: collectable <instancemethod 0xb79cf824> gc: collectable <dict 0xb793c79c> gc: collectable <HTTPResponse instance at 0xb793d2cc> gc: collectable <instancemethod 0xb79cf874> unreachable objects: 11 File descriptors number: 32 Memory: 4612 Thenth access: gc: collectable <dict 0xb78f14f4> gc: collectable <HTTPResponse instance at 0xb78f404c> gc: collectable <dict 0xb78f13e4> gc: collectable <HTTPMessage instance at 0xb78f462c> gc: collectable <dict 0xb78e5f0c> gc: collectable <list 0xb78eeb4c> gc: collectable <list 0xb78ee2ac> gc: collectable <instancemethod 0xb797b7fc> gc: collectable <dict 0xb78f168c> gc: collectable <HTTPResponse instance at 0xb78f442c> gc: collectable <instancemethod 0xb78eaa7c> unreachable objects: 110 File descriptors number: 32 Memory: 4680 After hundred access: gc: collectable <dict 0x89e2e84> gc: collectable <HTTPResponse instance at 0x89e3e2c> gc: collectable <dict 0x89e2d74> gc: collectable <HTTPMessage instance at 0x89e3ccc> gc: collectable <dict 0x89db0b4> gc: collectable <list 0x89e3cac> gc: collectable <list 0x89e32ec> gc: collectable <instancemethod 0x89d8964> gc: collectable <dict 0x89e60b4> gc: collectable <HTTPResponse instance at 0x89e50ac> gc: collectable <instancemethod 0x89ddb1c> unreachable objects: 1100 File descriptors number: 32 Memory: 5284 Each call to urllib2.urlopen() gives 11 new unreachable objects, increases memory footprint without giving new open files. Do you have any idea? With the hack proposed in message http://bugs.python.org/issue1208304#msg60751, number of unreachable objects goes down to 8 unreachable objects remaining, but still memory increases. Regards. stephbul PS My urlib2leak.py test calls monitor script (not able to attach it): #! /bin/sh PROCS='urllib2leak.py' RUNPID=`ps aux | grep "$PROCS" | grep -v "grep" | awk '{printf 2ドル}'` FDESC=`lsof -p $RUNPID | wc -l` MEM=`ps aux | grep "$PROCS" | grep -v "grep" | awk '{printf 6ドル }'` echo "File descriptors number: "$FDESC echo "Memory: "$MEM |
|||
| msg92245 - (view) | Author: clemens pecinovsky (peci) | Date: 2009年09月04日 10:17 | |
i also ran into the problem of cyclic dependencies. i know if i would call gc.collect() the problem would be solved, but calling gc.collect() takes a long time. the problem is the cyclic dependency with r.recv=r.read i have fixed it localy by wrapping the addinfourl into a new class (i called it addinfourlFixCyclRef) and overloading the close method, and within the close method set the recv to none again. class addinfourlFixCyclRef(addinfourl): def close(self): if self.fp is not None and hasattr(self.fp, "_sock"): self.fp._sock.recv = None addinfourl.close(self) .... r.recv = r.read fp = socket._fileobject(r, close=True) resp = addinfourlFixCyclRef(fp, r.msg, req.get_full_url()) and when i call .close() from the response it just works. Unluckily i had to patch even more in case there is an exception called. For the whole fix see the attachment |
|||
| msg114503 - (view) | Author: Mark Lawrence (BreamoreBoy) * | Date: 2010年08月21日 15:52 | |
On Windows Vista I can consistently reproduce this with 2.6 and 2.7 but not with 3.1 or 3.2. |
|||
| msg186550 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2013年04月11日 09:27 | |
The entire description of this issue is bogus. Reference cycles are not a bug, since Python has a cyclic garbage collector. Closing as invalid. |
|||
| msg186552 - (view) | Author: Ralf Schmitt (schmir) | Date: 2013年04月11日 09:52 | |
I'd consider reference cycles a bug especially if they prevent filedescriptors from being closed. please read the comments. |
|||
| msg186556 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2013年04月11日 12:39 | |
I see no file descriptor leak myself:
>>> f = urllib2.urlopen("http://www.google.com")
>>> f.fileno()
3
>>> os.fstat(3)
posix.stat_result(st_mode=49663, st_ino=5045244, st_dev=7L, st_nlink=1, st_uid=1000, st_gid=1000, st_size=0, st_atime=0, st_mtime=0, st_ctime=0)
>>> del f
>>> os.fstat(3)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OSError: [Errno 9] Bad file descriptor
Ditto with Python 3:
>>> f = urllib.request.urlopen("http://www.google.com")
>>> f.fileno()
3
>>> os.fstat(3)
posix.stat_result(st_mode=49663, st_ino=5071469, st_dev=7, st_nlink=1, st_uid=1000, st_gid=1000, st_size=0, st_atime=0, st_mtime=0, st_ctime=0)
>>> del f
>>> os.fstat(3)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OSError: [Errno 9] Bad file descriptor
Furthermore, you can use the `with` statement to ensure timely disposal of system resources:
>>> f = urllib.request.urlopen("http://www.google.com")
>>> with f: f.fileno()
...
3
>>> os.fstat(3)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OSError: [Errno 9] Bad file descriptor
|
|||
| msg186560 - (view) | Author: Mark Lawrence (BreamoreBoy) * | Date: 2013年04月11日 13:24 | |
Where did file descriptors come into it, surely this is all about memory leaks? In any case it's hardly a show stopper as there are at least three references above to the problem line of code and three workarounds. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:56:11 | admin | set | github: 42012 |
| 2013年04月11日 13:24:20 | BreamoreBoy | set | messages: + msg186560 |
| 2013年04月11日 12:39:18 | pitrou | set | messages: + msg186556 |
| 2013年04月11日 09:52:24 | schmir | set | messages: + msg186552 |
| 2013年04月11日 09:27:26 | pitrou | set | status: open -> closed nosy: + pitrou messages: + msg186550 resolution: not a bug |
| 2013年04月10日 22:03:15 | schmir | set | nosy:
+ schmir |
| 2011年02月09日 23:18:38 | gdub | set | nosy:
+ gdub |
| 2010年08月21日 15:52:52 | BreamoreBoy | set | nosy:
+ BreamoreBoy messages: + msg114503 |
| 2010年07月20日 03:16:43 | BreamoreBoy | set | versions: + Python 2.6, Python 3.1, Python 2.7, Python 3.2, - Python 2.5 |
| 2009年09月04日 10:17:26 | peci | set | files:
+ urllib2.py nosy: + peci messages: + msg92245 |
| 2009年06月03日 13:31:30 | stephbul | set | files:
+ urllib2leak.py versions: - Python 2.6, Python 2.7 nosy: + stephbul messages: + msg88812 |
| 2008年12月01日 10:40:12 | orsenthil | set | nosy:
+ orsenthil messages: + msg76683 |
| 2008年11月29日 01:16:45 | gregory.p.smith | set | type: resource usage components: + Library (Lib), - Extension Modules versions: + Python 2.6, Python 2.5, Python 2.7, - Python 2.4 |
| 2008年11月24日 22:22:34 | amaury.forgeotdarc | set | status: closed -> open nosy: + amaury.forgeotdarc resolution: wont fix -> (no value) messages: + msg76368 |
| 2008年11月24日 18:03:58 | jhylton | set | status: open -> closed nosy: + jhylton resolution: wont fix messages: + msg76350 |
| 2008年11月24日 05:47:08 | a.badger | set | messages: + msg76300 |
| 2008年11月24日 05:20:15 | a.badger | set | nosy:
+ a.badger messages: + msg76298 |
| 2005年05月25日 09:20:22 | manekcz | create | |