I'm developing a command line application in Python which processes information that users provide. I've been reading the Python documentation on this topic and found that the built-in module subprocess
provides the call
method, which executes a string as if it was run on a command line. The disadvantage of this approach is that it can lead to shell injection.
The solution I came up with uses a set of illegal substrings, such as rm
or -rf
, and then uses regex to check whether the provided string contains a substring from the set.
import re
from string import ascii_letters, punctuation
# Generate a set of characters used for string sanitization.
punctuation_minus = set(punctuation) - set(('.', '_', '-'))
substrings = set(("rm", "&&", "-f", "null", "/dev/null", "2>&1", "-rf",
"rf"))
illegal = [re.escape(item) for item in (punctuation_minus | substrings)]
illegal_as_str = "|".join(illegal)
def sanitize_string(s):
"""Cleans a string from illegal characters."""
# Get all legal strings.
operations = re.split(illegal_as_str, s)
# Remove trailing spaces
no_trailings = [x.strip() for x in operations]
# Remove empty strings.
list_of_operations = filter(None, no_trailings)
# Generate a safe string to return.
operations_str = " ".join(list_of_operations)
try:
operations_str.decode("utf-8")
return operations_str
except UnicodeDecodeError as detail:
print("UnicodeDecodeError: " + str(detail))
return False
The inherent issue with this approach is that it heavily depends on the set of illegal characters I'm using. Is there any way I can sanitize a string taking into account the common attacks? And if not, how can I improve my code?
The user is expected to provide git commands, such as git add
or git push
. Each one of these git commands needs certain parameters that the user needs to provide. The next example shows how all the illegal substrings are removed from a user input when adding files to the staging area:
>>> git_add_str = "git add {}"
>>> files_to_add = input("Files to add as string:")
Files to add as string: "file1.py file2.py; rm -rf /"
# Clean files_to_add before modifying git_add_str.
>>> clean_str = sanitize_string(files_to_add)
>>> clean_str
'file1.py file2.py'
>>> git_add_str = git_add_str.format(clean_str)
>>> git_add_str
'git add file1.py file2.py'
This output is considered legal as it does not contain any of the illegal substrings. Now it is safe to run subprocess.call
with this string and shell=True
.
1 Answer 1
Your defense strategy is known as "enumerating badness". As a rule, any strategy based on enumerating badness is doomed to fail. For example, you prohibit rm -rf /
, but have you considered find / -delete
, which basically does the same thing? How about some use of the dd if=/dev/zero
? What if a user legitimately wants to perform an operation on a file named normal.py
or even rm.c
? Why are you silently altering the command to do something else?
If you ever need to execute an external program with user-supplied parameters, sensible practices would be to use subprocess.run()
or subprocess.Popen()
- with the command line as a list already split into words,
- with the
shell=False
parameter (the best way to avoid shell injection is to avoid using the shell at all!), - with the command given using an absolute path (which is essential with
shell=False
), - with
--
preceding the user-supplied filenames, so that filenames that start with-
are treated literally.
Do not attempt to prohibit any "bad" words; you want your code to work for any legitimate filenames. Do check that the filenames are legitimate, though — for example, not containing too many ../../../..
.
Better yet, use a Python module that takes care of interacting with Git so that you don't have to deal with the problem yourself.
-
\$\begingroup\$ Have researched dozens of answers on this, and this I think is the best explanation. Don't enumerate badness, make badness improbable. \$\endgroup\$Chaim Eliyah– Chaim Eliyah2021年01月21日 06:37:03 +00:00Commented Jan 21, 2021 at 6:37
Explore related questions
See similar questions with these tags.
git push [-server] [-branch]
, or a commit message ingit commit -m [-message]
. What I'm trying to achieve is a method that "cleans" these parameters so that string gets executed safely. \$\endgroup\$shell=False
to avoid the secondary command? You could also escape each argument passed in. Running the process under a very low-privilege account might also limit your risk. \$\endgroup\$shell=False
but I always get this errorOSError: [Errno 2] No such file or directory
. \$\endgroup\$