Message160266
| Author |
Giuseppe.Attardi |
| Recipients |
Giuseppe.Attardi |
| Date |
2012年05月09日.09:39:43 |
| SpamBayes Score |
-1.0 |
| Marked as misclassified |
Yes |
| Message-id |
<1336556387.72.0.794459300498.issue14762@psf.upfronthosting.co.za> |
| In-reply-to |
| Content |
I confirm the presence of a serious memory leak in ElementTree, using the iterparse() function.
Memory grows disproportionately to dozens of GB when parsing a large XML file.
For further information, see discussion in:
http://www.gossamer-threads.com/lists/python/bugs/912164?do=post_view_threaded#912164
but notice that the comments attributing the problem to the OS are quite off the mark.
To replicate the problem, try this on a Wikipedia dump:
iterparse = ElementTree.iterparse(file)
id = None
for event, elem in iterparse:
if elem.tag.endswith("title"):
title = elem.text
elif elem.tag.endswith("id") and not id:
id = elem.text
elif elem.tag.endswith("text"):
print id, title, elem.text[:20] |
|
History
|
|---|
| Date |
User |
Action |
Args |
| 2012年05月09日 09:39:47 | Giuseppe.Attardi | set | recipients:
+ Giuseppe.Attardi |
| 2012年05月09日 09:39:47 | Giuseppe.Attardi | set | messageid: <1336556387.72.0.794459300498.issue14762@psf.upfronthosting.co.za> |
| 2012年05月09日 09:39:44 | Giuseppe.Attardi | link | issue14762 messages |
| 2012年05月09日 09:39:43 | Giuseppe.Attardi | create |
|