homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: os.walk() consumes a lot of file descriptors
Type: resource usage Stage: resolved
Components: Library (Lib) Versions: Python 3.6, Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: benhoyt, gvanrossum, larry, martin.panter, python-dev, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2016年01月02日 17:57 by serhiy.storchaka, last changed 2022年04月11日 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
walk_consume_fds_opt1.patch serhiy.storchaka, 2016年01月03日 08:48 review
walk_consume_fds_opt2.patch serhiy.storchaka, 2016年01月03日 08:49 review
Messages (13)
msg257352 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016年01月02日 17:57
Since 3.5 os.walk() consumes a lot of file descriptors. Its number equals to the deep of directories tree. Since the number of file descriptors is limited, this can cause problems.
This was the main reason for rejecting fwalk-based implementation of os.walk() (issue15200).
msg257404 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016年01月03日 08:48
Here are two alternative solutions for this issue.
msg258018 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016年01月11日 21:10
I don't like the first solution (it just collects all scandir() results in a list, which defeats one of the purposes of using scandir()), but I don't understand the second solution.
Ideally the number of FDs used should be limited by the depth of the tree being walked, right?
msg258022 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016年01月11日 21:35
Both patches are basically equivalent. The first one collects all scandir() results in a list, the second one collects only directory names in a list. The purpose of using os.scandir() in os.walk() was a speed up (issue23605), and both patches preserve it.
Yes, the number of FDs used is equivalent to the depth of the tree which can be very deep (I just created a tree depth of 1000 levels). And what is worse, all these FDs can be effectively leaked if the walking was not finished. This is unwanted behavior change.
msg258025 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016年01月11日 21:42
I am all for preventing the leaks. But using FDs proportional to the tree
depth seems reasonable to me. (If you are worried about some kind of DoS
attack on the algorithm by someone who can build a tree with depth 1000,
well, if they can do that they can also create a flat folder with a million
files in it.)
Is there a potential hybrid strategy, where for small directories we close
the FD but for large directories we keep it open?
msg258096 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016年01月12日 10:06
> But using FDs proportional to the tree depth seems reasonable to me.
This was the main reason for rejecting issue15200. May be we can allow this for globbing, but only explicitly enabled with special parameter or separate functions. The benefit for globbing is starting to yield results without internal caching. The benefit for os.walk() is smaller, because in any case it yields files and directories not one-by-one, but collected into lists.
> Is there a potential hybrid strategy, where for small directories we close
> the FD but for large directories we keep it open?
May be with more complicated code. I think this needs explicit close() method, so it can go only in 3.6. For now I think we have to commit one of my patches (the difference only stylistic).
msg258116 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016年01月12日 18:36
I like them both, if I had to choose I'd pick patch 2.
But yes, we need to add a close() method to the scandir iterator object.
In the meantime, I am still worried about what would happen if somehow the loop got interrupted and the frame got kept alive and the iterator wasn't closed by its dealloc until much later. This kind of thing was common in asyncio and we had to resort to similar tricks to break some cycles. Maybe you can add a try/finally that *deletes* scandir_it to force it to close itself (at least in CPython)? That can go into 3.5.
msg258238 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016年01月14日 22:26
Yes, OSError, MemoryError and KeyboardInterrupt can be raised during iterating scandir object or calling is_dir() or is_symlink() methods of the result. They stop iterating and left file descriptor open. If close file descriptor on error during iteration (issue26117), patch 1 becomes more safe, because it minimizes the lifetime of opened descriptor.
msg260089 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016年02月11日 11:32
New changeset e951d76f1945 by Serhiy Storchaka in branch '3.5':
Issue #25995: os.walk() no longer uses FDs proportional to the tree depth.
https://hg.python.org/cpython/rev/e951d76f1945
New changeset 6197a09a56b1 by Serhiy Storchaka in branch 'default':
Issue #25995: os.walk() no longer uses FDs proportional to the tree depth.
https://hg.python.org/cpython/rev/6197a09a56b1 
msg260095 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016年02月11日 11:43
walk_consume_fds_opt1.patch is applied in 3.5 (it is more reliable with the absence of close()) and walk_consume_fds_opt2.patch is applied in 3.6.
msg260784 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016年02月24日 10:40
The change seems to be causing some Windows buildbot failures <http://buildbot.python.org/all/builders/AMD64%20Windows7%20SP1%203.5/builds/666/steps/test/logs/stdio>:
======================================================================
ERROR: test_walk_bad_dir (test.test_os.BytesWalkTests)
----------------------------------------------------------------------
Traceback (most recent call last):
 File "C:\buildbot.python.org3円.5.kloth-win64\build\lib\test\test_os.py", line 931, in test_walk_bad_dir
 root, dirs, files = next(walk_it)
 File "C:\buildbot.python.org3円.5.kloth-win64\build\lib\test\test_os.py", line 1027, in walk
 for broot, bdirs, bfiles in os.walk(os.fsencode(top), **kwargs):
 File "C:\buildbot.python.org3円.5.kloth-win64\build\lib\os.py", line 372, in walk
 entries = list(scandir(top))
TypeError: os.scandir() doesn't support bytes path on Windows, use Unicode instead
======================================================================
ERROR: test_walk_bottom_up (test.test_os.BytesWalkTests)
ERROR: test_walk_prune (test.test_os.BytesWalkTests)
Was this line
entries = list(scandir(top))
meant to be
entries = list(scandir_it)
msg260788 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016年02月24日 11:05
New changeset 9bffe39e8273 by Serhiy Storchaka in branch '3.5':
Fixed a bug in os.walk() with bytes path on Windows caused by merging fixes
https://hg.python.org/cpython/rev/9bffe39e8273 
msg260789 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016年02月24日 11:05
Thanks Martin!
History
Date User Action Args
2022年04月11日 14:58:25adminsetgithub: 70183
2016年02月24日 11:05:58serhiy.storchakasetstatus: open -> closed

messages: + msg260789
2016年02月24日 11:05:05python-devsetmessages: + msg260788
2016年02月24日 10:40:27martin.pantersetstatus: closed -> open
nosy: + martin.panter
messages: + msg260784

2016年02月11日 11:43:04serhiy.storchakasetstatus: open -> closed
resolution: fixed
messages: + msg260095

stage: patch review -> resolved
2016年02月11日 11:32:19python-devsetnosy: + python-dev
messages: + msg260089
2016年02月08日 15:59:59serhiy.storchakasetassignee: serhiy.storchaka
2016年01月14日 22:26:51serhiy.storchakasetmessages: + msg258238
2016年01月12日 18:36:17gvanrossumsetmessages: + msg258116
2016年01月12日 10:06:48serhiy.storchakasetmessages: + msg258096
2016年01月11日 21:42:12gvanrossumsetmessages: + msg258025
2016年01月11日 21:35:22serhiy.storchakasetmessages: + msg258022
2016年01月11日 21:10:38gvanrossumsetnosy: + gvanrossum
messages: + msg258018
2016年01月03日 08:49:03serhiy.storchakasetfiles: + walk_consume_fds_opt2.patch
2016年01月03日 08:48:40serhiy.storchakasetfiles: + walk_consume_fds_opt1.patch
keywords: + patch
messages: + msg257404

stage: patch review
2016年01月02日 17:57:05serhiy.storchakacreate

AltStyle によって変換されたページ (->オリジナル) /