This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2010年10月31日 11:19 by karstenw, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Messages (5) | |||
|---|---|---|---|
| msg120041 - (view) | Author: Karsten Wolf (karstenw) | Date: 2010年10月31日 11:19 | |
It would be helpful to have a tarfile iterator that does not cache every archive member encountered. This makes it nearly impossible to iterate over an archive with millions of files. |
|||
| msg120042 - (view) | Author: Lars Gustäbel (lars.gustaebel) * (Python committer) | Date: 2010年10月31日 11:34 | |
I assume you're using Python 2.x. because tarfile's memory footprint was significantly reduced in Python 3.0, see the patch in issue2058 and r62337. This patch was not backported to the 2.x branch back then. As the 2.x branch has been closed for new features, this is not going to happen in the future. |
|||
| msg120043 - (view) | Author: Karsten Wolf (karstenw) | Date: 2010年10月31日 11:58 | |
Yes, I'm on 2.6. I checked the Python 3.x tarfile just for this one line in TarFile.next(): self.members.append(tarinfo) to conclude it would have the same problem. Reducing 2.5gb memory usage as measured in my particular case by 60%, still leaves 1.5gb ram burned which is too much on a 32-bit 2gb ram machine. My solution was to comment out that line which worked perfectly for my case but may not be the solution for the module. |
|||
| msg123835 - (view) | Author: Lars Gustäbel (lars.gustaebel) * (Python committer) | Date: 2010年12月12日 12:17 | |
There is no trivial or backwards-compatible solution to this problem. The way it is now, there is no alternative to storing all TarInfo objects: there is no central table of contents in an archive we could use, so we must create our own. In other words, tarfile does not "burn" memory without a reason. The problem you encounter is somehow a corner case, fortunately with a simple workaround: for tarinfo in tar: ... tar.members = [] There are two things that I will clearly refuse to do. One thing is to add yet another option to the TarFile class to switch off caching as this would make many TarFile methods dysfunctional without the user knowing why. The other thing is to add an extra non-caching Iterator class. Sorry, that I have nothing more to offer. Maybe, someone else comes up with a brilliant idea. |
|||
| msg263714 - (view) | Author: Lars Gustäbel (lars.gustaebel) * (Python committer) | Date: 2016年04月19日 07:18 | |
Closing after six years of inactivity. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:08 | admin | set | github: 54470 |
| 2016年04月19日 07:18:38 | lars.gustaebel | set | status: open -> closed resolution: wont fix messages: + msg263714 stage: resolved |
| 2010年12月12日 12:17:27 | lars.gustaebel | set | messages: + msg123835 |
| 2010年10月31日 12:05:49 | pitrou | set | type: enhancement -> resource usage versions: + Python 3.1, Python 2.7, Python 3.2 |
| 2010年10月31日 11:58:44 | karstenw | set | messages: + msg120043 |
| 2010年10月31日 11:34:39 | lars.gustaebel | set | assignee: lars.gustaebel messages: + msg120042 nosy: + lars.gustaebel |
| 2010年10月31日 11:19:56 | karstenw | create | |