This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2016年01月02日 17:57 by serhiy.storchaka, last changed 2022年04月11日 14:58 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| walk_consume_fds_opt1.patch | serhiy.storchaka, 2016年01月03日 08:48 | review | ||
| walk_consume_fds_opt2.patch | serhiy.storchaka, 2016年01月03日 08:49 | review | ||
| Messages (13) | |||
|---|---|---|---|
| msg257352 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2016年01月02日 17:57 | |
Since 3.5 os.walk() consumes a lot of file descriptors. Its number equals to the deep of directories tree. Since the number of file descriptors is limited, this can cause problems. This was the main reason for rejecting fwalk-based implementation of os.walk() (issue15200). |
|||
| msg257404 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2016年01月03日 08:48 | |
Here are two alternative solutions for this issue. |
|||
| msg258018 - (view) | Author: Guido van Rossum (gvanrossum) * (Python committer) | Date: 2016年01月11日 21:10 | |
I don't like the first solution (it just collects all scandir() results in a list, which defeats one of the purposes of using scandir()), but I don't understand the second solution. Ideally the number of FDs used should be limited by the depth of the tree being walked, right? |
|||
| msg258022 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2016年01月11日 21:35 | |
Both patches are basically equivalent. The first one collects all scandir() results in a list, the second one collects only directory names in a list. The purpose of using os.scandir() in os.walk() was a speed up (issue23605), and both patches preserve it. Yes, the number of FDs used is equivalent to the depth of the tree which can be very deep (I just created a tree depth of 1000 levels). And what is worse, all these FDs can be effectively leaked if the walking was not finished. This is unwanted behavior change. |
|||
| msg258025 - (view) | Author: Guido van Rossum (gvanrossum) * (Python committer) | Date: 2016年01月11日 21:42 | |
I am all for preventing the leaks. But using FDs proportional to the tree depth seems reasonable to me. (If you are worried about some kind of DoS attack on the algorithm by someone who can build a tree with depth 1000, well, if they can do that they can also create a flat folder with a million files in it.) Is there a potential hybrid strategy, where for small directories we close the FD but for large directories we keep it open? |
|||
| msg258096 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2016年01月12日 10:06 | |
> But using FDs proportional to the tree depth seems reasonable to me. This was the main reason for rejecting issue15200. May be we can allow this for globbing, but only explicitly enabled with special parameter or separate functions. The benefit for globbing is starting to yield results without internal caching. The benefit for os.walk() is smaller, because in any case it yields files and directories not one-by-one, but collected into lists. > Is there a potential hybrid strategy, where for small directories we close > the FD but for large directories we keep it open? May be with more complicated code. I think this needs explicit close() method, so it can go only in 3.6. For now I think we have to commit one of my patches (the difference only stylistic). |
|||
| msg258116 - (view) | Author: Guido van Rossum (gvanrossum) * (Python committer) | Date: 2016年01月12日 18:36 | |
I like them both, if I had to choose I'd pick patch 2. But yes, we need to add a close() method to the scandir iterator object. In the meantime, I am still worried about what would happen if somehow the loop got interrupted and the frame got kept alive and the iterator wasn't closed by its dealloc until much later. This kind of thing was common in asyncio and we had to resort to similar tricks to break some cycles. Maybe you can add a try/finally that *deletes* scandir_it to force it to close itself (at least in CPython)? That can go into 3.5. |
|||
| msg258238 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2016年01月14日 22:26 | |
Yes, OSError, MemoryError and KeyboardInterrupt can be raised during iterating scandir object or calling is_dir() or is_symlink() methods of the result. They stop iterating and left file descriptor open. If close file descriptor on error during iteration (issue26117), patch 1 becomes more safe, because it minimizes the lifetime of opened descriptor. |
|||
| msg260089 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2016年02月11日 11:32 | |
New changeset e951d76f1945 by Serhiy Storchaka in branch '3.5': Issue #25995: os.walk() no longer uses FDs proportional to the tree depth. https://hg.python.org/cpython/rev/e951d76f1945 New changeset 6197a09a56b1 by Serhiy Storchaka in branch 'default': Issue #25995: os.walk() no longer uses FDs proportional to the tree depth. https://hg.python.org/cpython/rev/6197a09a56b1 |
|||
| msg260095 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2016年02月11日 11:43 | |
walk_consume_fds_opt1.patch is applied in 3.5 (it is more reliable with the absence of close()) and walk_consume_fds_opt2.patch is applied in 3.6. |
|||
| msg260784 - (view) | Author: Martin Panter (martin.panter) * (Python committer) | Date: 2016年02月24日 10:40 | |
The change seems to be causing some Windows buildbot failures <http://buildbot.python.org/all/builders/AMD64%20Windows7%20SP1%203.5/builds/666/steps/test/logs/stdio>: ====================================================================== ERROR: test_walk_bad_dir (test.test_os.BytesWalkTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\buildbot.python.org3円.5.kloth-win64\build\lib\test\test_os.py", line 931, in test_walk_bad_dir root, dirs, files = next(walk_it) File "C:\buildbot.python.org3円.5.kloth-win64\build\lib\test\test_os.py", line 1027, in walk for broot, bdirs, bfiles in os.walk(os.fsencode(top), **kwargs): File "C:\buildbot.python.org3円.5.kloth-win64\build\lib\os.py", line 372, in walk entries = list(scandir(top)) TypeError: os.scandir() doesn't support bytes path on Windows, use Unicode instead ====================================================================== ERROR: test_walk_bottom_up (test.test_os.BytesWalkTests) ERROR: test_walk_prune (test.test_os.BytesWalkTests) Was this line entries = list(scandir(top)) meant to be entries = list(scandir_it) |
|||
| msg260788 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2016年02月24日 11:05 | |
New changeset 9bffe39e8273 by Serhiy Storchaka in branch '3.5': Fixed a bug in os.walk() with bytes path on Windows caused by merging fixes https://hg.python.org/cpython/rev/9bffe39e8273 |
|||
| msg260789 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2016年02月24日 11:05 | |
Thanks Martin! |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:58:25 | admin | set | github: 70183 |
| 2016年02月24日 11:05:58 | serhiy.storchaka | set | status: open -> closed messages: + msg260789 |
| 2016年02月24日 11:05:05 | python-dev | set | messages: + msg260788 |
| 2016年02月24日 10:40:27 | martin.panter | set | status: closed -> open nosy: + martin.panter messages: + msg260784 |
| 2016年02月11日 11:43:04 | serhiy.storchaka | set | status: open -> closed resolution: fixed messages: + msg260095 stage: patch review -> resolved |
| 2016年02月11日 11:32:19 | python-dev | set | nosy:
+ python-dev messages: + msg260089 |
| 2016年02月08日 15:59:59 | serhiy.storchaka | set | assignee: serhiy.storchaka |
| 2016年01月14日 22:26:51 | serhiy.storchaka | set | messages: + msg258238 |
| 2016年01月12日 18:36:17 | gvanrossum | set | messages: + msg258116 |
| 2016年01月12日 10:06:48 | serhiy.storchaka | set | messages: + msg258096 |
| 2016年01月11日 21:42:12 | gvanrossum | set | messages: + msg258025 |
| 2016年01月11日 21:35:22 | serhiy.storchaka | set | messages: + msg258022 |
| 2016年01月11日 21:10:38 | gvanrossum | set | nosy:
+ gvanrossum messages: + msg258018 |
| 2016年01月03日 08:49:03 | serhiy.storchaka | set | files: + walk_consume_fds_opt2.patch |
| 2016年01月03日 08:48:40 | serhiy.storchaka | set | files:
+ walk_consume_fds_opt1.patch keywords: + patch messages: + msg257404 stage: patch review |
| 2016年01月02日 17:57:05 | serhiy.storchaka | create | |