Boolean expression evaluation for user input using evil eval

Question 1

The following code is a module that evaluates boolean expressions, which are entered through an unauthorized web API. The exposed function of the library being used is evaluate(). Its positional argument string is a string of a pythonic boolean expression that is entered by the user and thusly untrustworthy (so it's potentially any string). I implemented some basic checks to ensure that arbitrary code execution is mitigated. Most of the security is outsourced to the callback function used to evaluate the boolness of user-defined tokens. I'd appreciate a review with focus on security, but also appreciate general feedback.

"""Boolean evaluation."""
from typing import Callable, Iterator
__all__ = ['SecurityError', 'evaluate']
PARENTHESES = frozenset({'(', ')'})
KEYWORDS = frozenset({'and', 'or', 'not'})
class SecurityError(Exception):
 """Indicates a possible security breach
 in parsing boolean statements.
 """
def bool_val(token: str, callback: Callable[[str], bool]) -> str:
 """Evaluates the given statement into a boolean value."""
 callback_result = callback(token)
 if isinstance(callback_result, bool):
 return str(callback_result)
 raise SecurityError('Callback method did not return a boolean value.')
def tokenize(word: str) -> Iterator[str]:
 """Yields tokens of a string."""
 for index, char in enumerate(word):
 if char in PARENTHESES:
 yield word[:index]
 yield char
 yield from tokenize(word[index+1:])
 break
 else:
 yield word
def boolexpr(string: str, callback: Callable[[str], bool]) -> Iterator[str]:
 """Yields boolean expression elements for python."""
 for word in string.strip().split():
 for token in filter(None, tokenize(word)):
 if token in KEYWORDS or token in PARENTHESES:
 yield token
 else:
 yield bool_val(token, callback)
def evaluate(
 string: str, *,
 callback: Callable[[str], bool] = lambda s: s.casefold() == 'true'
 ) -> bool:
 """Safely evaluates a boolean string."""
 return bool(eval(' '.join(boolexpr(string, callback))))

Update

This library is used in one place only, namely a real estates filtering API, where web applications can retrieve real estates from a database. By using boolean expressions parsed by this library code, the web applications can filter the retrieved real estates. The actual matching is done in the callback function passed to evaluate, which matches common filtering properties on the respective real estates and then uses evaluate() to assert whether the requested combination of properties is met. E.g. id in [1, 2, 3] and (sales_type == "rent" or price < 250000) which, depending on the real estate, evaluates the matching to True and ( False or True ) which is then passed to eval().

Question 2

Boolean-expr-as-a-service does not seem needed. Can you give any description to the contrary?

Question 3

In what part of the architecture is (for example) your ID membership check converted into a boolean? The client, the web server, this application server?

Question 4

All evaluations are done on the server side.

Question 5

So a client writes an arbitrary expression, sends it to the server, the server evaluates boolean tokens to truth values, fills them in in the expression, the expression itself is evaluated, and then what? Used for filtering in memory, in SQL?

Question 6

In-memory filtering of XML DOM models. Filtering on database-level would be too complex and not significantly improve performance (yes, I tested that).

Question 7

I mean to say this in the gentlest possible way, but from top to bottom this is a bad idea. It would take a more detailed explanation of why this creature exists to convince me otherwise.

Your default callback produces some pretty crazy results: ^ and 1 both evaluate to False, for instance.

A critical step in security is to limit DOS potential by constraining the possible expression input length.

I would replace your PARENTHESES and KEYWORDS with an AST-walk that is very strict on node types.

Your eval uses all-defaults, which means:

A code object will be immediately compiled, without being able to examine the AST first
Any future-features available to the calling scope are implicitly passed into the compiled scope
Just... read this and tell me that it's a good idea:

If both [global and local] dictionaries are omitted, the expression is executed with the globals and locals in the environment where eval() is called.

I would also rethink your callback. If the idea is to have a small collection of names that evaluate to booleans, just pass these in as locals and call it a day. Anything more complicated expands the potential for misuse.

If this needs to exist at all, I would suggest something like

"""Boolean evaluation."""
import ast
from types import CodeType
from typing import Union
EXPR_LIMIT = 100
class SecurityError(Exception):
 """Indicates a possible security breach
 in parsing boolean statements.
 """
def compile_isolated(
 source: Union[str, ast.Expression], flags: int,
) -> Union[ast.Expression, CodeType]:
 return compile(
 source=source,
 filename='<untrusted_expr>',
 mode='eval',
 flags=flags,
 dont_inherit=1, # no future features, isolated flags
 optimize=2, # cut asserts and docstrings
 )
def evaluate(string: str, **bool_locals: bool) -> bool:
 if len(string) > EXPR_LIMIT:
 raise SecurityError('Expression too long')
 expr: ast.Expression = compile_isolated(
 source=string,
 flags=ast.PyCF_ONLY_AST, # no await, no pep484, defer compilation
 )
 if not isinstance(expr, ast.Expression):
 raise SecurityError('Invalid parent node')
 top_op, = ast.iter_child_nodes(expr)
 for node in ast.walk(top_op):
 if isinstance(node, (
 ast.UnaryOp, ast.Not, ast.BoolOp, ast.boolop, ast.Name, ast.Load,
 )):
 continue
 if isinstance(node, ast.Constant):
 literal = ast.literal_eval(node)
 if not isinstance(literal, bool):
 raise SecurityError('Invalid literal')
 else:
 raise SecurityError('Invalid node')
 code_object: CodeType = compile_isolated(source=expr, flags=0)
 isolated_globals = {'__builtins__': {}}
 for v in bool_locals.values():
 if not isinstance(v, bool):
 raise ValueError('Invalid value for bool_locals')
 result = eval(code_object, isolated_globals, bool_locals)
 if not isinstance(result, bool):
 raise SecurityError('Unexpected non-boolean output')
 return result
def test() -> None:
 assert evaluate('not (a and False) or True', a=True) is True
 assert evaluate('not False') is True
 assert evaluate('a and (b or c)', a=True, b=True, c=False) is True
 for bad in (
 '__loader__ or True',
 'exit',
 'quit',
 'eval and False',
 ):
 try:
 evaluate(bad)
 raise AssertionError('Should have failed')
 except NameError:
 pass
 for bad in (
 '1 or False',
 'int(False)',
 'False * True',
 'exit()',
 'dos'*40,
 ):
 try:
 evaluate(bad)
 raise AssertionError('Should have failed')
 except SecurityError:
 pass
if __name__ == '__main__':
 test()

score 4 · Accepted Answer · 2022-01-03 20:31:34Z

I mean to say this in the gentlest possible way, but from top to bottom this is a bad idea. It would take a more detailed explanation of why this creature exists to convince me otherwise.

Your default callback produces some pretty crazy results: ^ and 1 both evaluate to False, for instance.

A critical step in security is to limit DOS potential by constraining the possible expression input length.

I would replace your PARENTHESES and KEYWORDS with an AST-walk that is very strict on node types.

Your eval uses all-defaults, which means:

A code object will be immediately compiled, without being able to examine the AST first
Any future-features available to the calling scope are implicitly passed into the compiled scope
Just... read this and tell me that it's a good idea:

If both [global and local] dictionaries are omitted, the expression is executed with the globals and locals in the environment where eval() is called.

I would also rethink your callback. If the idea is to have a small collection of names that evaluate to booleans, just pass these in as locals and call it a day. Anything more complicated expands the potential for misuse.

If this needs to exist at all, I would suggest something like

"""Boolean evaluation."""
import ast
from types import CodeType
from typing import Union
EXPR_LIMIT = 100
class SecurityError(Exception):
 """Indicates a possible security breach
 in parsing boolean statements.
 """
def compile_isolated(
 source: Union[str, ast.Expression], flags: int,
) -> Union[ast.Expression, CodeType]:
 return compile(
 source=source,
 filename='<untrusted_expr>',
 mode='eval',
 flags=flags,
 dont_inherit=1, # no future features, isolated flags
 optimize=2, # cut asserts and docstrings
 )
def evaluate(string: str, **bool_locals: bool) -> bool:
 if len(string) > EXPR_LIMIT:
 raise SecurityError('Expression too long')
 expr: ast.Expression = compile_isolated(
 source=string,
 flags=ast.PyCF_ONLY_AST, # no await, no pep484, defer compilation
 )
 if not isinstance(expr, ast.Expression):
 raise SecurityError('Invalid parent node')
 top_op, = ast.iter_child_nodes(expr)
 for node in ast.walk(top_op):
 if isinstance(node, (
 ast.UnaryOp, ast.Not, ast.BoolOp, ast.boolop, ast.Name, ast.Load,
 )):
 continue
 if isinstance(node, ast.Constant):
 literal = ast.literal_eval(node)
 if not isinstance(literal, bool):
 raise SecurityError('Invalid literal')
 else:
 raise SecurityError('Invalid node')
 code_object: CodeType = compile_isolated(source=expr, flags=0)
 isolated_globals = {'__builtins__': {}}
 for v in bool_locals.values():
 if not isinstance(v, bool):
 raise ValueError('Invalid value for bool_locals')
 result = eval(code_object, isolated_globals, bool_locals)
 if not isinstance(result, bool):
 raise SecurityError('Unexpected non-boolean output')
 return result
def test() -> None:
 assert evaluate('not (a and False) or True', a=True) is True
 assert evaluate('not False') is True
 assert evaluate('a and (b or c)', a=True, b=True, c=False) is True
 for bad in (
 '__loader__ or True',
 'exit',
 'quit',
 'eval and False',
 ):
 try:
 evaluate(bad)
 raise AssertionError('Should have failed')
 except NameError:
 pass
 for bad in (
 '1 or False',
 'int(False)',
 'False * True',
 'exit()',
 'dos'*40,
 ):
 try:
 evaluate(bad)
 raise AssertionError('Should have failed')
 except SecurityError:
 pass
if __name__ == '__main__':
 test()

Stack Exchange Network

Boolean expression evaluation for user input using evil eval

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Boolean expression evaluation for user input using evil eval

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions