Implement bash auto completion in Python

Question 1

I created a script supposed to use python to autocomplete Python commands. The full code is available here, however, I can give you an example. For instance, let's consider:

command, the command I want to give completion to (it does not have to actually exist).
script.py, the auto completion script.

import os, shlex
def complete(current_word, full_line):
 split = shlex.split(full_line)
 prefixes = ("test", "test-中文", 'test-한글')
 items = list(i + "-level-" + current_word for i in prefixes)
 word = "" if len(split) - 1 < int(current_word) else split[int(current_word)]
 return list(i for i in items if i.startswith(word))
if __name__ == "__main__":
 os.sys.stdout.write(" ".join(complete(
 *os.sys.argv.__getitem__(slice(2, None)))))
 os.sys.stdout.flush()

The following script can be executed directly into the shell, or added in the .bashrc to keep the changes persistent.

__complete_command() {
 COMPREPLY=($(python3 script.py complete $COMP_CWORD "${COMP_LINE}"));
};
complete -F __complete_command command

Tested on Ubuntu 18.04 with python 3.8. If you want, you can type the following in a new console

cd $(mktemp -d)
__complete_command() {
 COMPREPLY=($(python3 script.py complete $COMP_CWORD "${COMP_LINE}"));
};
complete -F __complete_command command
echo '
import os, shlex
def complete(current_word, full_line):
 split = shlex.split(full_line)
 prefixes = ("test", "test-中文", "test-한글")
 items = list(i + "-level-" + current_word for i in prefixes)
 word = "" if len(split) - 1 < int(current_word) else split[int(current_word)]
 return list(i for i in items if i.startswith(word))
if __name__ == "__main__":
 os.sys.stdout.write(" ".join(complete(
 *os.sys.argv.__getitem__(slice(2, None)))))
 os.sys.stdout.flush()' > script.py

Is it the most efficient way to do this?

Question 2

The documentation for complete is in the Bash man page, under "Builtin Commands". On some systems there's a separate bash-builtins man page for convenience.

Question 3

Splitting words

The Python script splits the shell command's full line to words using shlex. I see a few issues with this:

I'm not sure this will split the line exactly the same way as the shell would. Looking at help(shlex), I see "A lexical analyzer class for simple shell-like syntaxes", and I find that not very reassuring.
I think command line completion should be blazingly fast, so I look suspiciously at anything that needs to be import-ed, such as shlex.
Looking at the Programmable Completion section in man bash, it seems that Bash populates the COMP_WORDS array with the result of the split.

Therefore, it would be good to pass COMP_WORDS to the Python script, which would eliminate all the above concerns.

One way to achieve this would be to call the Python script with:

python3 script.py complete "$COMP_CWORD" "${COMP_WORDS[@]}"

And then change the Python script accordingly:

#!/usr/bin/env python3
import sys
def complete(comp_cword, *comp_words):
 prefixes = ("test", "test-中文", "test-한글")
 word = comp_words[int(comp_cword)]
 items = (prefix + "-level-" + comp_cword for prefix in prefixes)
 return (item for item in items if item.startswith(word))
if __name__ == "__main__":
 sys.stdout.write(" ".join(complete(*sys.argv[2:])))
 sys.stdout.flush()

Avoid converting generators to list when not needed

No lists were needed in the original script, everything could have been just generator expressions.

Keep it simple

I don't understand why the script imports os and uses os.sys and nothing else in os. You could just import sys instead.

I don't understand why sys.argv.__getitem__(slice(2, None)) was used instead of the simple and natural sys.argv[2:].

This line is complex, it takes attention to understand:

word = "" if len(split) - 1 < int(current_word) else split[int(current_word)]

This is a lot easier to understand:

if len(split) - 1 < int(current_word):
 word = ""
else:
 word = split[int(current_word)]

Looking further, word is used only in a filter .startswith(word). That filter will match every string. In which case, to maximize performance, it would be best to not create word, and not do any filtering, but return items directly:

if len(split) - 1 < int(current_word):
 return items

On even closer look, I don't see how COMP_CWORD can ever be an index out of range. So the check for bounds was unnecessary. (Strictly speaking, an index out of bounds may be possible when splitting words with shlex, since that might not be identical to the shell's own word splitting. Even then, it would be a highly unlikely case, therefore a more Pythonic way to handle the situation would be using a try-expect for a IndexError.)

Use better names

The name i is really best reserved for loop counters. (Even then, often you may find better names...)

In the suggested solution above I renamed the parameter names to match the shell variables they come from. I find this reduces the cognitive burden when reading the documentation of the variables in man bash, and the implementation of the completion code in Python.

Question 4

Thanks for the answer, man bash was the resource I was looking for (I could find the necessary explanations about the variables). Also, the reason I used getitem was because my keyboard was broken and I was unable to type [.

Question 5

The line word = "" if len(split) - 1 < int(current_word) else split[int(current_word)] was here because, for instance if I am at the beginning of a new word, the split list will only contain the completed words, but not the current word (because empty).

Question 6

list(i for i in items if i.startswith(word))

This is just a list comprehension with extra steps (and overhead). If you want a list as an end result, wrap the comprehension in [], not ().

[i for i in items if i.startswith(word)]

You were using a generator expression to produce a generator, then forcing it by putting it into a list.

Then, the same change can be made to the definition of items. This will be more efficient, and looks cleaner anyway.

If you're only supporting newer versions of Python (3.7+), I think f-strings would also neaten up items:

items = [f"{i}-level-{current_word}" for i in prefixes]

"" if len(split) - 1 < int(current_word) else split[int(current_word)]

I think the condition is complex/varied enough that it takes a couple looks to see the else. Personally, I'd wrap the condition in parenthesis:

"" if (len(split) - 1 < int(current_word)) else split[int(current_word)]

Question 7

Do you have an idea of where to search for the 'complete' definition, because I even looked up COMP_LINE in the entire linux repository (github.com/torvalds/linux/search?q=COMP_LINE&type=Code), the coreutil one (I downloaded it and ran "grep -r . -e 'COMP_LINE'"), but nothing appears, I even tried xterm, gnome-terminal, but there is no trace of this function. I tried to use forkstat to get the process but nothing.

Question 8

@NadirGhoul Couldn't tell you. Coincidently, I just used Linux for the first time today.

Question 9

You have searched a lot of unrelated sources, but not the Bash sources. github.com/bminor/bash/…

Question 10

@tripleee Nope. I only started using the bash and Linux a little over a week ago like I said. I'm still becoming familiar with it myself.

Question 11

Your sys.stdout.write + flush is a fancy print call... You could just write

if __name__ == '__main__':
 print(*complete(*sys.argv[2:]), end='', flush=True)

instead.

janos 113k15 gold badges154 silver badges396 bronze badges · Accepted Answer · 2020-01-11 22:20:20Z

Splitting words

The Python script splits the shell command's full line to words using shlex. I see a few issues with this:

I'm not sure this will split the line exactly the same way as the shell would. Looking at help(shlex), I see "A lexical analyzer class for simple shell-like syntaxes", and I find that not very reassuring.
I think command line completion should be blazingly fast, so I look suspiciously at anything that needs to be import-ed, such as shlex.
Looking at the Programmable Completion section in man bash, it seems that Bash populates the COMP_WORDS array with the result of the split.

Therefore, it would be good to pass COMP_WORDS to the Python script, which would eliminate all the above concerns.

One way to achieve this would be to call the Python script with:

python3 script.py complete "$COMP_CWORD" "${COMP_WORDS[@]}"

And then change the Python script accordingly:

#!/usr/bin/env python3
import sys
def complete(comp_cword, *comp_words):
 prefixes = ("test", "test-中文", "test-한글")
 word = comp_words[int(comp_cword)]
 items = (prefix + "-level-" + comp_cword for prefix in prefixes)
 return (item for item in items if item.startswith(word))
if __name__ == "__main__":
 sys.stdout.write(" ".join(complete(*sys.argv[2:])))
 sys.stdout.flush()

Avoid converting generators to list when not needed

No lists were needed in the original script, everything could have been just generator expressions.

Keep it simple

I don't understand why the script imports os and uses os.sys and nothing else in os. You could just import sys instead.

I don't understand why sys.argv.__getitem__(slice(2, None)) was used instead of the simple and natural sys.argv[2:].

This line is complex, it takes attention to understand:

word = "" if len(split) - 1 < int(current_word) else split[int(current_word)]

This is a lot easier to understand:

if len(split) - 1 < int(current_word):
 word = ""
else:
 word = split[int(current_word)]

Looking further, word is used only in a filter .startswith(word). That filter will match every string. In which case, to maximize performance, it would be best to not create word, and not do any filtering, but return items directly:

if len(split) - 1 < int(current_word):
 return items

On even closer look, I don't see how COMP_CWORD can ever be an index out of range. So the check for bounds was unnecessary. (Strictly speaking, an index out of bounds may be possible when splitting words with shlex, since that might not be identical to the shell's own word splitting. Even then, it would be a highly unlikely case, therefore a more Pythonic way to handle the situation would be using a try-expect for a IndexError.)

Use better names

The name i is really best reserved for loop counters. (Even then, often you may find better names...)

In the suggested solution above I renamed the parameter names to match the shell variables they come from. I find this reduces the cognitive burden when reading the documentation of the variables in man bash, and the implementation of the completion code in Python.

Thanks for the answer, man bash was the resource I was looking for (I could find the necessary explanations about the variables). Also, the reason I used getitem was because my keyboard was broken and I was unable to type [.
The line word = "" if len(split) - 1 < int(current_word) else split[int(current_word)] was here because, for instance if I am at the beginning of a new word, the split list will only contain the completed words, but not the current word (because empty).

Stack Exchange Network

Implement bash auto completion in Python

3 Answers 3

Splitting words

Avoid converting generators to list when not needed

Keep it simple

Use better names

You must log in to answer this question.

Hot Network Questions

Implement bash auto completion in Python

3 Answers 3

Splitting words

Avoid converting generators to list when not needed

Keep it simple

Use better names

You must log in to answer this question.

Related

Hot Network Questions