homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: module importing performance regression
Type: performance Stage: resolved
Components: Versions: Python 3.4, Python 3.5
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: brett.cannon, daveroundy, eric.snow, ncoghlan, pitrou
Priority: normal Keywords:

Created on 2015年04月11日 20:07 by daveroundy, last changed 2022年04月11日 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
test.py daveroundy, 2017年07月16日 12:02 script to demonstrate performance regression.
Messages (10)
msg240491 - (view) Author: David Roundy (daveroundy) Date: 2015年04月11日 20:07
I have observed a performance regression in module importing. In python 3.4.2, importing a module from the current directory (where the script is located) causes the entire directory to be read. When there are many files in this directory, this can cause the script to run very slowly.
In python 2.7.9, this behavior is not present.
It would be preferable (in my opinion) to revert the change that causes python to read the entire user directory.
msg240492 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015年04月11日 20:37
This change is actually an optimization. The directory is only read once and its contents are then cached, which allows for much quicker imports when multiple modules are in the directory (common case of a Python package).
Can you tell us more about your setup?
- how many files are in the directory
- what filesystem is used
- whether the filesystem is local or remote (e.g. network-attached)
- your OS and OS version
Also, how long is "very slowly"?
msg240493 - (view) Author: David Roundy (daveroundy) Date: 2015年04月11日 20:50
I had suspected that might be the case. At this point mostly it's just a
test case where I generated a lot of files to demonstrate the issue. In my
test case hello world with one module import takes a minute and 40 seconds.
I could make it take longer, of course, by creating more files.
I do think scaling should be a consideration when introducing
optimizations, when if getdents is usually pretty fast. If the script
directory is normally the last one in the search path couldn't you skip the
listing of that directory without losing your optimization?
On Sat, Apr 11, 2015, 1:37 PM Antoine Pitrou <report@bugs.python.org> wrote:
>
> Antoine Pitrou added the comment:
>
> This change is actually an optimization. The directory is only read once
> and its contents are then cached, which allows for much quicker imports
> when multiple modules are in the directory (common case of a Python
> package).
>
> Can you tell us more about your setup?
> - how many files are in the directory
> - what filesystem is used
> - whether the filesystem is local or remote (e.g. network-attached)
> - your OS and OS version
>
> Also, how long is "very slowly"?
>
> ----------
> nosy: +pitrou
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue23916>
> _______________________________________
>
msg240494 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015年04月11日 21:27
I was asking questions because I wanted to have more precise data. I can't reproduce here: even with 500000 files in a directory, the first import takes 0.2s, not one minute.
msg240500 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015年04月11日 21:52
As for your question:
> If the script
> directory is normally the last one in the search path couldn't you
> skip the
> listing of that directory without losing your optimization?
Given the way the code is architected, that would complicate things significantly. Also it would introduce a rather unexpected discrepancy.
msg240514 - (view) Author: David Roundy (daveroundy) Date: 2015年04月12日 00:20
My tests involved 8 million files on an ext4 file system. I expect that
accounts for the difference. It's true that it's an excessive number of
files, and maybe the best option is to ignore the problem.
On Sat, Apr 11, 2015 at 2:52 PM Antoine Pitrou <report@bugs.python.org>
wrote:
>
> Antoine Pitrou added the comment:
>
> As for your question:
>
> > If the script
> > directory is normally the last one in the search path couldn't you
> > skip the
> > listing of that directory without losing your optimization?
>
> Given the way the code is architected, that would complicate things
> significantly. Also it would introduce a rather unexpected discrepancy.
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue23916>
> _______________________________________
>
msg240515 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015年04月12日 00:34
Indeed, that doesn't sound like something we want to support. I'm closing then.
msg298433 - (view) Author: David Roundy (daveroundy) Date: 2017年07月16日 12:02
Here is a little script to demonstrate the regression (which yes, is still bothering me).
msg298434 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017年07月16日 12:25
Thanks for the reproducer. I haven't changed my mind on the resolution, as it is an extremely unlikely usecase (a directory with 1e8 files is painful to manage with standard command-line tools). I suggest you change your approach, for example you could use a directory hashing scheme to spread the files into smaller subdirectories.
msg298450 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2017年07月16日 21:19
I agree with Antoine that this shouldn't change. Having said that, it
wouldn't be hard to write your own finder using importlib that doesn't get
the directory contents and instead checks for the file directly (and you
could even set it just for your troublesome directory to get the
performance benefit from the default finder).
On Sun, Jul 16, 2017, 05:25 Antoine Pitrou, <report@bugs.python.org> wrote:
>
> Antoine Pitrou added the comment:
>
> Thanks for the reproducer. I haven't changed my mind on the resolution,
> as it is an extremely unlikely usecase (a directory with 1e8 files is
> painful to manage with standard command-line tools). I suggest you change
> your approach, for example you could use a directory hashing scheme to
> spread the files into smaller subdirectories.
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue23916>
> _______________________________________
>
History
Date User Action Args
2022年04月11日 14:58:15adminsetgithub: 68104
2017年07月16日 21:19:52brett.cannonsetmessages: + msg298450
2017年07月16日 12:25:45pitrousetmessages: + msg298434
2017年07月16日 12:02:59daveroundysetfiles: + test.py
type: performance
messages: + msg298433

versions: + Python 3.5
2015年04月12日 00:34:21pitrousetstatus: open -> closed
resolution: wont fix
messages: + msg240515

stage: resolved
2015年04月12日 00:20:59daveroundysetmessages: + msg240514
2015年04月11日 21:52:04pitrousetmessages: + msg240500
2015年04月11日 21:27:53pitrousetmessages: + msg240494
2015年04月11日 20:50:13daveroundysetmessages: + msg240493
2015年04月11日 20:37:03pitrousetnosy: + pitrou
messages: + msg240492
2015年04月11日 20:20:41serhiy.storchakasetnosy: + brett.cannon, ncoghlan, eric.snow
2015年04月11日 20:07:34daveroundycreate

AltStyle によって変換されたページ (->オリジナル) /