Avoiding shell injection when calling Git in Python

Question 1

I'm developing a command line application in Python which processes information that users provide. I've been reading the Python documentation on this topic and found that the built-in module subprocess provides the call method, which executes a string as if it was run on a command line. The disadvantage of this approach is that it can lead to shell injection.

The solution I came up with uses a set of illegal substrings, such as rm or -rf, and then uses regex to check whether the provided string contains a substring from the set.

import re
from string import ascii_letters, punctuation
# Generate a set of characters used for string sanitization.
punctuation_minus = set(punctuation) - set(('.', '_', '-'))
substrings = set(("rm", "&&", "-f", "null", "/dev/null", "2>&1", "-rf",
 "rf"))
illegal = [re.escape(item) for item in (punctuation_minus | substrings)]
illegal_as_str = "|".join(illegal)
def sanitize_string(s):
 """Cleans a string from illegal characters."""
 # Get all legal strings.
 operations = re.split(illegal_as_str, s)
 # Remove trailing spaces
 no_trailings = [x.strip() for x in operations]
 # Remove empty strings.
 list_of_operations = filter(None, no_trailings)
 # Generate a safe string to return.
 operations_str = " ".join(list_of_operations)
 try:
 operations_str.decode("utf-8")
 return operations_str
 except UnicodeDecodeError as detail:
 print("UnicodeDecodeError: " + str(detail))
 return False

The inherent issue with this approach is that it heavily depends on the set of illegal characters I'm using. Is there any way I can sanitize a string taking into account the common attacks? And if not, how can I improve my code?

The user is expected to provide git commands, such as git add or git push. Each one of these git commands needs certain parameters that the user needs to provide. The next example shows how all the illegal substrings are removed from a user input when adding files to the staging area:

>>> git_add_str = "git add {}"
>>> files_to_add = input("Files to add as string:")
Files to add as string: "file1.py file2.py; rm -rf /"
# Clean files_to_add before modifying git_add_str.
>>> clean_str = sanitize_string(files_to_add)
>>> clean_str
'file1.py file2.py'
>>> git_add_str = git_add_str.format(clean_str)
>>> git_add_str
'git add file1.py file2.py'

This output is considered legal as it does not contain any of the illegal substrings. Now it is safe to run subprocess.call with this string and shell=True.

Question 2

I would think trying to create a white list of good command patterns would be easier than trying to explicitly disallow every command that could possibly be bad.

Question 3

@ThisIsNoZaku but using a white list won't allow the user to specify parameters, such as server and branch in git push [-server] [-branch], or a commit message in git commit -m [-message]. What I'm trying to achieve is a method that "cleans" these parameters so that string gets executed safely.

Question 4

Can you use shell=False to avoid the secondary command? You could also escape each argument passed in. Running the process under a very low-privilege account might also limit your risk.

Question 5

@Scovetta I tried using shell=False but I always get this error OSError: [Errno 2] No such file or directory.

Question 6

@tulians If this program is only meant to forward commands to git, you could maybe tokenize all the input and discard or error out on any that don't start with "git".

Question 7

Your defense strategy is known as "enumerating badness". As a rule, any strategy based on enumerating badness is doomed to fail. For example, you prohibit rm -rf /, but have you considered find / -delete, which basically does the same thing? How about some use of the dd if=/dev/zero? What if a user legitimately wants to perform an operation on a file named normal.py or even rm.c? Why are you silently altering the command to do something else?

If you ever need to execute an external program with user-supplied parameters, sensible practices would be to use subprocess.run() or subprocess.Popen()

with the command line as a list already split into words,
with the shell=False parameter (the best way to avoid shell injection is to avoid using the shell at all!),
with the command given using an absolute path (which is essential with shell=False),
with -- preceding the user-supplied filenames, so that filenames that start with - are treated literally.

Do not attempt to prohibit any "bad" words; you want your code to work for any legitimate filenames. Do check that the filenames are legitimate, though — for example, not containing too many ../../../...

Better yet, use a Python module that takes care of interacting with Git so that you don't have to deal with the problem yourself.

Question 8

Have researched dozens of answers on this, and this I think is the best explanation. Don't enumerate badness, make badness improbable.

score 12 · Accepted Answer · 2017-02-21 05:42:56Z

Your defense strategy is known as "enumerating badness". As a rule, any strategy based on enumerating badness is doomed to fail. For example, you prohibit rm -rf /, but have you considered find / -delete, which basically does the same thing? How about some use of the dd if=/dev/zero? What if a user legitimately wants to perform an operation on a file named normal.py or even rm.c? Why are you silently altering the command to do something else?

If you ever need to execute an external program with user-supplied parameters, sensible practices would be to use subprocess.run() or subprocess.Popen()

with the command line as a list already split into words,
with the shell=False parameter (the best way to avoid shell injection is to avoid using the shell at all!),
with the command given using an absolute path (which is essential with shell=False),
with -- preceding the user-supplied filenames, so that filenames that start with - are treated literally.

Do not attempt to prohibit any "bad" words; you want your code to work for any legitimate filenames. Do check that the filenames are legitimate, though — for example, not containing too many ../../../...

Better yet, use a Python module that takes care of interacting with Git so that you don't have to deal with the problem yourself.

Have researched dozens of answers on this, and this I think is the best explanation. Don't enumerate badness, make badness improbable.

Stack Exchange Network

Avoiding shell injection when calling Git in Python

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Avoiding shell injection when calling Git in Python

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions