- 
  Notifications
 You must be signed in to change notification settings 
- Fork 249
[WIP] Fix false-negatives and false-positives #99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
...-p's f-n's and the infinite loop in get_first_node
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one seems bypass-able, so not a false-positive, but it easily could have been, in order to do this well we would need to be able to solve predicates, via like Z3str3 or something. I feel these are acceptable for the foreseeable future.
...(rhs_vars), Move add_blackbox_or_builtin_call to expr_visitor where it should be, and CALL_IDENTIFIER to expr_visitor_helper, some flake8 on vulnerabilities_test
 
 
 pyt/expr_visitor.py
 
 Outdated
 
 
 There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we did not make a new list, things like this would happen:
> User input at line 8, trigger word "request.args.get(": ¤call_1 = ret_request.args.get('image_name') Reassigned in: File: example/vulnerable_code/path_traversal_sanitised.py > Line 8: image_name = ¤call_1 File: example/vulnerable_code/path_traversal_sanitised.py > reaches line 10, trigger word "replace(": ¤call_2 = ret_image_name.replace('..', '')
though the problem on master was that we didn't include ~call_N in .args, and thus in the return value of get_sink_args and so we had a false-negative that's now fixed. However, or_in_sink.py remains unsolved, for the time being.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason or_in_sink.py doesn't pass yet is because we don't if isinstance for BoolOp in the 'loop through args' code in visit_Call, should be easy enough.
So I was half way done with expr_star_visitor when I decided to re-org things again and make things more user-friendly. 😮
... vulnerabilities_test.py
...ALL_IDENTIFIER back to what it is on master
WHAT IS LEFT
Write more tests, especially having many BoolOps -- like in if_else_in_sink.py e.g.  return redirect(request.args.get('Laundry') and 'crazy' or request.args.get('French') and 'foo' or request.args.get('The'))
Make existing tests pass
ControlFlowExpr inherit nicely from namedtuple
Clean up ControlFlowNode.connect and ControlFlowExpr.connect
random:
Improve visit_keyword
to make foo= make foo tainted
to make it work with **kwargs, etc.
Kill BUILTINS tuple, make everything blackbox
Replace isinstance(expr, AssignmentNode) with CallAssignment node
Fix if if with IfExp
An aside:
Inherit from named_tuple, etc.
AFTER WORK
New usage picture in README (PR to the readthedocs repository)
Test out redirect(request.args.get('next')) on current branch
Test out FlaskBB BoolOp redirect on current branch
Root README: There are 2 things you will need to do to configure PyT
 Taint entry points
 Look over the file of sources and sinks (maybe blackbox functions too)
At home:
 See how to parse routing.py
 Do Django CBVs too
 See how hard it is to do object.function() inlining
The current problem with
return redirect(request.args.get('The') if 'hey' or 'you' else request.args.get('French') if 'foo' else request.args.get('Aces') and 'c')
is that expr_star_handler return the tainted aces call as a last expression, because StrNodes never get added to cfg_expressions, get_last_expressions can't see that aces is not the last expression. Hmmm, I fixed it with this:
--- a/pyt/cfg/expr_visitor.py +++ b/pyt/cfg/expr_visitor.py @@ -86,6 +86,7 @@ class ExprVisitor(StmtVisitor): variables = list() visual_variables = list() first_node = None + last_node_should_not_be_ignored = False node_not_to_step_past = self.nodes[-1] for expr in exprs: @@ -112,7 +113,8 @@ class ExprVisitor(StmtVisitor): label.visit(expr) visual_variables.append(label.result) - if not isinstance(node, StrNode): + last_node_should_not_be_ignored = not isinstance(node, StrNode) + if last_node_should_not_be_ignored: cfg_expressions.append(node) if not first_node: if isinstance(node, ControlFlowExpr): @@ -130,7 +132,7 @@ class ExprVisitor(StmtVisitor): return ConnectExpressions( first_expression=first_node, all_expressions=cfg_expressions, - last_expressions=get_last_expressions(cfg_expressions) if cfg_expressions else [], + last_expressions=get_last_expressions(cfg_expressions) if last_node_should_not_be_ignored else [], variables=variables, visual_variables=visual_variables )
Now it only reports a vuln, if aces was the last expression!
...a StrNode the last expression, instead of nothing. e.g. return redirect(request.args.get('The') if 'hey' or 'you' else request.args.get('French') if 'foo' else request.args.get('Aces') and 'c')
 ...ast_var_in_and_is_not_tainted with test
... match. Did boolop.connect to each boolop last expression.
inner_most_call notes
HOW WE CURRENTLY SET THE INNER MOST CALL ATTRIBUTE FOR BLACKBOX CALLS:
When looping through args and keyword args in add_blackbox_call
If the arg is a call
We do things, explained below.
This code essentially does 3 things:
 The last arg that is a function call gets connected to the outer call node
 Relevant code:
 # connect other_inner to outer in e.g.
 `scrypt.outer(scrypt.inner(image_name), scrypt.other_inner(image_name))`
 I think this may be not needed if the code is
 e.g.
 `scrypt.outer(scrypt.inner(image_name), scrypt.other_inner(image_name), some_var)`
 but maybe it is, currently we do it anyway.
 The first arg that is a function call gets set as_ the inner_most_call
 Calling it the inner_most_call is inaccurate, it should be called first_call
 We obtain the inner_most_call by looping through them in _get_inner_most_function_call
 Relevant code:
 # I should only set this once per loop, inner in e.g.
 # `scrypt.outer(scrypt.inner(image_name), scrypt.other_inner(image_name))`
 # (inner_most_call is used when predecessor is a ControlFlowNode in connect_control_flow_node)
 call_node.inner_most_call = return_value_of_nested_call
 Should we still do this if the first arg is not a function call?
 e.g. 
 `scrypt.outer(foo, scrypt.inner(image_name), scrypt.other_inner(image_name))`
 I think we should because control flow really does go into the inner function first.
 We connect one arg that is a function call, to the next arg that is a function call
 Relevant code:
 # connect inner to other_inner in e.g.
 # `scrypt.outer(scrypt.inner(image_name), scrypt.other_inner(image_name))`
 # I should probably loop to the inner most call of other_inner here.
 try:
 last_return_value_of_nested_call.connect(return_value_of_nested_call.first_node)
 except AttributeError:
 last_return_value_of_nested_call.connect(return_value_of_nested_call)
 Like the comment says, I am not sure if I should connect inner to other_inner or some_thing_else in
 e.g.
 `scrypt.outer(scrypt.inner(image_name), scrypt.other_inner(some_thing_else(image_name)))`
In save_def_args_in_temp of process_function we do the same thing pretty much.
HOW WE CURRENTLY USE THE INNER MOST CALL ATTRIBUTE:
When connecting a ControlFlowNode to an AssignmentCallNode
We loop through the last_nodes of the ControlFlowNode and
connect them with the inner_most_call_node of the AssignmentCallNode
_get_inner_most_function_call:
Loops through calls and
For blackbox calls, this is the inner_most_call
For user-defined calls, this is the .first_node
So blackbox calls are done well, and user-defined calls are done lazily.
Code more or less:
stmt_star_handler ->
connect_nodes ->
if node isinstance ControlFlowNode ->
for last in control_flow_node.last_nodes:
if isinstance(next_node, AssignmentCallNode):
inner_most_call_node = _get_inner_most_function_call(call_node)
last.connect(inner_most_call_node)
This line seems dead
Line 308 in fbca4d2
b/c of the above. (i.e. we only use first_node of user-defined functions.
OPEN QUESTIONS
Question
Why do we only do this when connecting a ControlFlowNode to an AssignmentCallNode?
e.g.
If we have
foo = 5
outer(inner(foo))
I still want to connect foo = 5 to inner() and not outer, right?
Answer
We do connect_to_allowed -1 stuff that we are trying to get away from
Question
What tests do we currently have for the above?
Answer
???
Removed the if CALL_IDENTIFIER in garbage Removed the node_not_to_step_past.connect(first_node) Fixed up _get_inner_most_expression, connect_nodes and wrote the top 4 expr_cfg_tests # This is used in connect_nodes call_node.first_expression = connected_expressions.first_expression Fixed remaining expr_cfg_tests Left off BinOp for another day
Uh oh!
There was an error while loading. Please reload this page.
So after my last vulnerabilities PR and my 2 re-organization PRs I think I'll try my hand at finding the false-negatives of our last evaluation. Fixing bugs and false-positives is easier than false-negatives so I'm looking forward to this :D
(The crashes in the last evaluation were caused by us analyzing invalid python and there was also an infinite loop in
get_first_nodethat I now fixed 104cf80.)