Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Speed up Python completion #396

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jakseb wants to merge 18 commits into scop:main
base: main
Choose a base branch
Loading
from jakseb:python-speedup
Open

Speed up Python completion #396

jakseb wants to merge 18 commits into scop:main from jakseb:python-speedup

Conversation

@jakseb
Copy link
Contributor

@jakseb jakseb commented Mar 11, 2020

  • Make completion of dotted names (e.g. python -m foo.b) faster.
  • (minor one) Ignore errors in walk_packages() so that a package raising an exception does not prevent other modules from being listed.

Copy link
Contributor Author

jakseb commented Mar 11, 2020

Side note: helpers/python is skipped by pre-commit hooks (black, flake8). Is this intentional?

Copy link
Owner

@scop scop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Test change applied and black, flake8, and isort now run on helpers/python, skipping wasn't optional.

helpers/python Outdated
# completing something with a dot in it.

def iter_submodules(pkgname):
__import__(pkgname)
Copy link
Owner

@scop scop Mar 17, 2020
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some concerns here.

First, the docs of __import__ say

Direct use of __import__() is also discouraged in favor of importlib.import_module().

...and in pydoc

... meant for use by the Python interpreter and not for general use ...

Second, I'm not that happy with importing in the first place, due to possible side effects in doing so, also possible security issues arising from that.

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, the docs of __import__ say

Direct use of __import__() is also discouraged in favor of importlib.import_module().

I didn't know what versions of Python bash-completion is supposed to support. I decided to stick to the lowest version able to run the current helper. For CPython it seems to be 2.5 and 3.0 (not tested, I only inspected source code of pkgutil: v2.5a2, v3.0a1). On the other hand, importlib was introduced in Python 2.7 and 3.1 (docs: Python 2, Python 3). So I chose to use __import__() instead. Moreover, this is what the implementation of walk_packages() uses.

I can switch to using importlib if compatibility with old Pythons is not an issue. Just let me know.

Second, I'm not that happy with importing in the first place, due to possible side effects in doing so, also possible security issues arising from that.

The current script is already doing imports. Quoting the docs of walk_packages():

Note that this function must import all packages (not all modules!) on the given path, in order to access the __path__ attribute to find submodules.

I must admit, though, that my code may in some cases import something that the original version wouldn't. I will modify it to check whether the module I am going to import is a package, in order to match the behavior of the current version. (Although this is non-trivial).

Copy link
Owner

@scop scop Mar 21, 2020
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm.

What i'd ideally like to see is:

  1. old python versions not dropped
  2. no use of discouraged functions
  3. no arbitrary code execution issues out of the box in expected default configs

For 1) and 2) I suppose using importlib if available and __import__ only if not could be the way to go here, unless it gets "too ugly".

For 3) I think it's not actually as bad as I initially thought it'd be, as I don't think we're actually importing anything from the current dir unless the user has explicitly wanted to do that through use of PYTHONPATH. So from that POV we're fine with importing stuff in the first place.

Wrt old python versions, I think our CI setup goes down to 2.6.6 on CentOS 6.

jakseb added 18 commits April 10, 2020 13:51
Search only for submodules of the relevant package if the name to complete
contains a dot. Also instruct walk_packages() to ignore all exceptions.
Display a human-readable error message instead of the exception traceback.
Make sure it does not look like relative import and does not contain empty
componenents.
Also move main code to a function.
This reverts commit 07f674aa3d38a1e3afcf2618c4758e2b00a964e8.
This reverts commit 7be98fc4842179febc57300d805253a56d889497.
Copy link
Contributor Author

jakseb commented Apr 17, 2020
edited
Loading

@scop I have pushed a new version (sorry that nearly a month later). Could you take a look and express your opinion? (I don't want this to be merged yet.) After introducing many changes, this eventually became a complete rewrite (and the outcome is somewhat of a monstrosity 😟).

You were actually right when you were concerned about potential side effects of imports. If the helper blindly imports arbitrary modules, then for example trying to complete pydoc antigravity. results in launching a web browser. Another problematic module is idlelib.idle as well as some modules whose names end with .__main__ (although this case is totally understandable and could be easily filtered out). For this reason, I decided to restrict imports to packages only.

In addition, I implemented filtering of module names in recursive walk in order to further speed up the traversal (in some cases).

Unfortunately, those two things caused the code to grow considerably in size and complexity.

There is one more issue I should mention. I did some tests, comparing the outputs of this new helper and of the old one (the one present in master). There are some items that the old version prints but my version does not. However, this is not a problem IMO. The unlisted modules aren't directly importable, so they are useless as arguments to pydoc or python -m. The helper is also used by pylint and mypy completions. The missing modules are (on my machine) samba.subunit.run and things like pip._vendor.{something}. I doubt anybody needs to run pylint or mypy on those.

(The script I used to compare helpers' outputs is available here.)

Copy link
Owner

@scop scop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of comments, as said it's quite a big change. But thanks for spending time on this already now! Anyway, what i'd like to see more is a bunch of test cases to our test suite, both the usual kind, as well as perhaps some more intimate ones as we're dealing with a chunk of python code here that pytest would likely be a good fit for.

if find_spec is not None:
spec = find_spec(pkgname)
if spec is not None and spec.submodule_search_locations is not None:
return import_module(pkgname)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

find_spec was found, it just yielded a "not a package" verdifct, should we stop the search here within the if block and not fall back?

Suggested change
return import_module(pkgname)
return import_module(pkgname)
raise PackageNotFoundError(pkgname)

return import_module(pkgname)
# find_spec() unavailable, fall back to searching iter_modules(...).
if parentname:
path = parent.__path__ or []
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__path__ is documented as

A list of strings specifying the search path within a package. This attribute is not set on modules.

Can we take advantage of that when deciding if something is (not) a package?

@scop scop force-pushed the master branch 3 times, most recently from c0e9459 to 09307f8 Compare March 30, 2021 15:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

@scop scop scop requested changes

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

2 participants

AltStyle によって変換されたページ (->オリジナル) /