-
Notifications
You must be signed in to change notification settings - Fork 400
Speed up Python completion #396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Side note: helpers/python is skipped by pre-commit hooks (black, flake8). Is this intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Test change applied and black, flake8, and isort now run on helpers/python, skipping wasn't optional.
helpers/python
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some concerns here.
First, the docs of __import__ say
Direct use of
__import__()is also discouraged in favor ofimportlib.import_module().
...and in pydoc
... meant for use by the Python interpreter and not for general use ...
Second, I'm not that happy with importing in the first place, due to possible side effects in doing so, also possible security issues arising from that.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, the docs of
__import__sayDirect use of
__import__()is also discouraged in favor ofimportlib.import_module().
I didn't know what versions of Python bash-completion is supposed to support. I decided to stick to the lowest version able to run the current helper. For CPython it seems to be 2.5 and 3.0 (not tested, I only inspected source code of pkgutil: v2.5a2, v3.0a1). On the other hand, importlib was introduced in Python 2.7 and 3.1 (docs: Python 2, Python 3). So I chose to use __import__() instead. Moreover, this is what the implementation of walk_packages() uses.
I can switch to using importlib if compatibility with old Pythons is not an issue. Just let me know.
Second, I'm not that happy with importing in the first place, due to possible side effects in doing so, also possible security issues arising from that.
The current script is already doing imports. Quoting the docs of walk_packages():
Note that this function must import all packages (not all modules!) on the given path, in order to access the
__path__attribute to find submodules.
I must admit, though, that my code may in some cases import something that the original version wouldn't. I will modify it to check whether the module I am going to import is a package, in order to match the behavior of the current version. (Although this is non-trivial).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm.
What i'd ideally like to see is:
- old python versions not dropped
- no use of discouraged functions
- no arbitrary code execution issues out of the box in expected default configs
For 1) and 2) I suppose using importlib if available and __import__ only if not could be the way to go here, unless it gets "too ugly".
For 3) I think it's not actually as bad as I initially thought it'd be, as I don't think we're actually importing anything from the current dir unless the user has explicitly wanted to do that through use of PYTHONPATH. So from that POV we're fine with importing stuff in the first place.
Wrt old python versions, I think our CI setup goes down to 2.6.6 on CentOS 6.
Search only for submodules of the relevant package if the name to complete contains a dot. Also instruct walk_packages() to ignore all exceptions.
Display a human-readable error message instead of the exception traceback.
Make sure it does not look like relative import and does not contain empty componenents.
Also move main code to a function.
This reverts commit 07f674aa3d38a1e3afcf2618c4758e2b00a964e8.
This reverts commit 7be98fc4842179febc57300d805253a56d889497.
842d0d8 to
35bb130
Compare
@scop I have pushed a new version (sorry that nearly a month later). Could you take a look and express your opinion? (I don't want this to be merged yet.) After introducing many changes, this eventually became a complete rewrite (and the outcome is somewhat of a monstrosity 😟).
You were actually right when you were concerned about potential side effects of imports. If the helper blindly imports arbitrary modules, then for example trying to complete pydoc antigravity. results in launching a web browser. Another problematic module is idlelib.idle as well as some modules whose names end with .__main__ (although this case is totally understandable and could be easily filtered out). For this reason, I decided to restrict imports to packages only.
In addition, I implemented filtering of module names in recursive walk in order to further speed up the traversal (in some cases).
Unfortunately, those two things caused the code to grow considerably in size and complexity.
There is one more issue I should mention. I did some tests, comparing the outputs of this new helper and of the old one (the one present in master). There are some items that the old version prints but my version does not. However, this is not a problem IMO. The unlisted modules aren't directly importable, so they are useless as arguments to pydoc or python -m. The helper is also used by pylint and mypy completions. The missing modules are (on my machine) samba.subunit.run and things like pip._vendor.{something}. I doubt anybody needs to run pylint or mypy on those.
(The script I used to compare helpers' outputs is available here.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of comments, as said it's quite a big change. But thanks for spending time on this already now! Anyway, what i'd like to see more is a bunch of test cases to our test suite, both the usual kind, as well as perhaps some more intimate ones as we're dealing with a chunk of python code here that pytest would likely be a good fit for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
find_spec was found, it just yielded a "not a package" verdifct, should we stop the search here within the if block and not fall back?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__path__ is documented as
A list of strings specifying the search path within a package. This attribute is not set on modules.
Can we take advantage of that when deciding if something is (not) a package?
c0e9459 to
09307f8
Compare
python -m foo.b) faster.walk_packages()so that a package raising an exception does not prevent other modules from being listed.