[WIP] Fix false-negatives and false-positives #99

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Closed

KevinHock wants to merge 24 commits into master from fix_oakie_false_negative

Closed

[WIP] Fix false-negatives and false-positives #99

KevinHock wants to merge 24 commits into master from fix_oakie_false_negative

Conversation

@KevinHock

Copy link

Collaborator

@KevinHock KevinHock commented Mar 29, 2018 •

edited

Loading

So after my last vulnerabilities PR and my 2 re-organization PRs I think I'll try my hand at finding the false-negatives of our last evaluation. Fixing bugs and false-positives is easier than false-negatives so I'm looking forward to this :D

(The crashes in the last evaluation were caused by us analyzing invalid python and there was also an infinite loop in get_first_node that I now fixed 104cf80.)

@KevinHock


 Make a false_negatives directory with the issues from the 11/16/17 ev...

3b4bf1b

...aluation

@KevinHock KevinHock added the cool label

Mar 29, 2018

KevinHock added 3 commits

March 29, 2018 18:56

@KevinHock


 Made a folder example/november_2017_evaluation with a directory for f...

153501f

...-p's f-n's and the infinite loop in get_first_node

@KevinHock


 Fix the foddy infinite loop

104cf80

@KevinHock


 Delete the infinite_loop folder since it is fixed

@KevinHock KevinHock changed the title ~~(削除) Fix false-negatives! (削除ここまで)~~ (追記) Fix false-negatives and false-positives (追記ここまで)

Mar 31, 2018

@KevinHock KevinHock changed the title ~~(削除) Fix false-negatives and false-positives (削除ここまで)~~ (追記) [WIP] Fix false-negatives and false-positives (追記ここまで)

Apr 3, 2018

KevinHock

KevinHock commented

Apr 4, 2018

View reviewed changes

example/november_2017_evaluation/false_positives/cheapskate_get_image.py

if not url:

abort(400)

if is_image(url):

Copy link

Collaborator Author

@KevinHock KevinHock Apr 4, 2018 •

edited

Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one seems bypass-able, so not a false-positive, but it easily could have been, in order to do this well we would need to be able to solve predicates, via like Z3str3 or something. I feel these are acceptable for the foreseeable future.

@KevinHock


 Make source_nested_in_sink.py test and pass it by making .args = list...

97074f3

...(rhs_vars), Move add_blackbox_or_builtin_call to expr_visitor where it should be, and CALL_IDENTIFIER to expr_visitor_helper, some flake8 on vulnerabilities_test

KevinHock

KevinHock commented

Apr 4, 2018

View reviewed changes

pyt/expr_visitor.py Outdated

# .args is only used in get_sink_args

# We make a new list because right_hand_side_variables is extended in assignment_call_node

call_node.right_hand_side_variables = rhs_vars

call_node.args = list(rhs_vars)

Copy link

Collaborator Author

@KevinHock KevinHock Apr 4, 2018 •

edited

Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we did not make a new list, things like this would happen:

 > User input at line 8, trigger word "request.args.get(":
 ¤call_1 = ret_request.args.get('image_name')
Reassigned in:
 File: example/vulnerable_code/path_traversal_sanitised.py
 > Line 8: image_name = ¤call_1
File: example/vulnerable_code/path_traversal_sanitised.py
 > reaches line 10, trigger word "replace(":
 ¤call_2 = ret_image_name.replace('..', '')

though the problem on master was that we didn't include ~call_N in .args, and thus in the return value of get_sink_args and so we had a false-negative that's now fixed. However, or_in_sink.py remains unsolved, for the time being.

Copy link

Collaborator Author

@KevinHock KevinHock Apr 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason or_in_sink.py doesn't pass yet is because we don't if isinstance for BoolOp in the 'loop through args' code in visit_Call, should be easy enough.

KevinHock added 4 commits

April 4, 2018 18:13

@KevinHock


 Make a test for if_else_in_sink

98b49b6

@KevinHock


 Fix double import trap https://stackoverflow.com/questions/43393764/p...

ff1bc89

...ython-3-6-project-structure-leads-to-runtimewarning

@KevinHock


 [False-negs] Problem illustration commit

6c7a8d2

@KevinHock


 [comments] w/r/t ConnectStatements

e9dd5e4

@KevinHock KevinHock mentioned this pull request

Apr 17, 2018

Fix the foddy infinite loop #118

Merged

@KevinHock

Copy link

Collaborator Author

KevinHock commented Apr 26, 2018

So I was half way done with expr_star_visitor when I decided to re-org things again and make things more user-friendly. 😮

KevinHock added 8 commits

May 10, 2018 16:21

@KevinHock


 save current work emergency

63c3f96

@KevinHock


 save current work emergency

8367f68

@KevinHock


 Merge branch 'master' into fix_oakie_false_negative

2578c29

@KevinHock


 Added changes back to file that git merge missed

6226a77

@KevinHock


 Fix whitespace

62e7420

@KevinHock


 Fixed some imports that got broken by the merge, indented EXPECTED in...

8b72e52

... vulnerabilities_test.py

@KevinHock


 Delete tests/vulnerabilities_test.py that I accidentally added, Fix C...

d42243c

...ALL_IDENTIFIER back to what it is on master

@KevinHock


 Mostly finished BoolOp, IfExp and expr_star_handler

695bbde

@KevinHock

Copy link

Collaborator Author

KevinHock commented Jul 4, 2018 •

edited

Loading

WHAT IS LEFT

Write more tests, especially having many BoolOps -- like in if_else_in_sink.py e.g. return redirect(request.args.get('Laundry') and 'crazy' or request.args.get('French') and 'foo' or request.args.get('The'))

Make existing tests pass

ControlFlowExpr inherit nicely from namedtuple

Clean up ControlFlowNode.connect and ControlFlowExpr.connect

random:
Improve visit_keyword
to make foo= make foo tainted
to make it work with **kwargs, etc.
Kill BUILTINS tuple, make everything blackbox
Replace isinstance(expr, AssignmentNode) with CallAssignment node
Fix if if with IfExp

An aside:
Inherit from named_tuple, etc.

AFTER WORK
New usage picture in README (PR to the readthedocs repository)

Test out redirect(request.args.get('next')) on current branch
Test out FlaskBB BoolOp redirect on current branch
Root README: There are 2 things you will need to do to configure PyT
 Taint entry points
 Look over the file of sources and sinks (maybe blackbox functions too)
At home:
 See how to parse routing.py
 Do Django CBVs too
 See how hard it is to do object.function() inlining

KevinHock added 2 commits

July 5, 2018 14:32

@KevinHock


 Handle BoolOp inside of BoolOp, Make ConnectExpressions a class, Fix ...

7ecf462

...up StrNode to be cleaner

@KevinHock


 Finished IfExp, found bug in expr_star_handler

1fa04e8

@KevinHock

Copy link

Collaborator Author

KevinHock commented Jul 7, 2018

The current problem with

 return redirect(request.args.get('The') if 'hey' or 'you' else request.args.get('French') if 'foo' else request.args.get('Aces') and 'c')

is that expr_star_handler return the tainted aces call as a last expression, because StrNodes never get added to cfg_expressions, get_last_expressions can't see that aces is not the last expression. Hmmm, I fixed it with this:

--- a/pyt/cfg/expr_visitor.py
+++ b/pyt/cfg/expr_visitor.py
@@ -86,6 +86,7 @@ class ExprVisitor(StmtVisitor):
 variables = list()
 visual_variables = list()
 first_node = None
+ last_node_should_not_be_ignored = False
 node_not_to_step_past = self.nodes[-1]
 
 for expr in exprs:
@@ -112,7 +113,8 @@ class ExprVisitor(StmtVisitor):
 label.visit(expr)
 visual_variables.append(label.result)
 
- if not isinstance(node, StrNode):
+ last_node_should_not_be_ignored = not isinstance(node, StrNode)
+ if last_node_should_not_be_ignored:
 cfg_expressions.append(node)
 if not first_node:
 if isinstance(node, ControlFlowExpr):
@@ -130,7 +132,7 @@ class ExprVisitor(StmtVisitor):
 return ConnectExpressions(
 first_expression=first_node,
 all_expressions=cfg_expressions,
- last_expressions=get_last_expressions(cfg_expressions) if cfg_expressions else [],
+ last_expressions=get_last_expressions(cfg_expressions) if last_node_should_not_be_ignored else [],
 variables=variables,
 visual_variables=visual_variables
 )

Now it only reports a vuln, if aces was the last expression!

KevinHock added 2 commits

July 7, 2018 12:35

@KevinHock


 Fix case where false-positives could occur by making the node before ...

...a StrNode the last expression, instead of nothing. e.g. return redirect(request.args.get('The') if 'hey' or 'you' else request.args.get('French') if 'foo' else request.args.get('Aces') and 'c')

@KevinHock


 Add vulnerable_code_with_expressions/last_var_in_and_is_tainted and l...

c6d9e62

...ast_var_in_and_is_not_tainted with test

KevinHock added 2 commits

July 9, 2018 20:58

@KevinHock


 Change visit_IfExp to handle .test better, change expr_cfg_test.py to...

4d2ad46

... match. Did boolop.connect to each boolop last expression.

@KevinHock


 Add example of connecting_control_flow_exprs_in_connect_expressions.py

5b1f262

@KevinHock

Copy link

Collaborator Author

KevinHock commented Jul 26, 2018

inner_most_call notes

HOW WE CURRENTLY SET THE INNER MOST CALL ATTRIBUTE FOR BLACKBOX CALLS:

When looping through args and keyword args in add_blackbox_call
If the arg is a call
We do things, explained below.

This code essentially does 3 things:

 The last arg that is a function call gets connected to the outer call node
 Relevant code:
 # connect other_inner to outer in e.g.
 `scrypt.outer(scrypt.inner(image_name), scrypt.other_inner(image_name))`
 I think this may be not needed if the code is
 e.g.
 `scrypt.outer(scrypt.inner(image_name), scrypt.other_inner(image_name), some_var)`
 but maybe it is, currently we do it anyway.
 The first arg that is a function call gets set as_ the inner_most_call
 Calling it the inner_most_call is inaccurate, it should be called first_call
 We obtain the inner_most_call by looping through them in _get_inner_most_function_call
 Relevant code:
 # I should only set this once per loop, inner in e.g.
 # `scrypt.outer(scrypt.inner(image_name), scrypt.other_inner(image_name))`
 # (inner_most_call is used when predecessor is a ControlFlowNode in connect_control_flow_node)
 call_node.inner_most_call = return_value_of_nested_call
 Should we still do this if the first arg is not a function call?
 e.g. 
 `scrypt.outer(foo, scrypt.inner(image_name), scrypt.other_inner(image_name))`
 I think we should because control flow really does go into the inner function first.
 We connect one arg that is a function call, to the next arg that is a function call
 Relevant code:
 # connect inner to other_inner in e.g.
 # `scrypt.outer(scrypt.inner(image_name), scrypt.other_inner(image_name))`
 # I should probably loop to the inner most call of other_inner here.
 try:
 last_return_value_of_nested_call.connect(return_value_of_nested_call.first_node)
 except AttributeError:
 last_return_value_of_nested_call.connect(return_value_of_nested_call)
 Like the comment says, I am not sure if I should connect inner to other_inner or some_thing_else in
 e.g.
 `scrypt.outer(scrypt.inner(image_name), scrypt.other_inner(some_thing_else(image_name)))`

In save_def_args_in_temp of process_function we do the same thing pretty much.

HOW WE CURRENTLY USE THE INNER MOST CALL ATTRIBUTE:

When connecting a ControlFlowNode to an AssignmentCallNode
We loop through the last_nodes of the ControlFlowNode and
connect them with the inner_most_call_node of the AssignmentCallNode

_get_inner_most_function_call:
Loops through calls and
For blackbox calls, this is the inner_most_call
For user-defined calls, this is the .first_node

So blackbox calls are done well, and user-defined calls are done lazily.

Code more or less:
stmt_star_handler ->
connect_nodes ->
if node isinstance ControlFlowNode ->
for last in control_flow_node.last_nodes:
if isinstance(next_node, AssignmentCallNode):
inner_most_call_node = _get_inner_most_function_call(call_node)
last.connect(inner_most_call_node)

@KevinHock

Copy link

Collaborator Author

KevinHock commented Jul 26, 2018

This line seems dead

pyt/pyt/cfg/expr_visitor.py

Line 308 in fbca4d2

first_node.inner_most_call = return_value_of_nested_call.first_node

b/c of the above. (i.e. we only use first_node of user-defined functions.

OPEN QUESTIONS

Question

Why do we only do this when connecting a ControlFlowNode to an AssignmentCallNode?
e.g.
If we have
foo = 5
outer(inner(foo))
I still want to connect foo = 5 to inner() and not outer, right?

Answer

We do connect_to_allowed -1 stuff that we are trying to get away from

Question

What tests do we currently have for the above?

Answer

???

@KevinHock


 Things I did on 7/26/18

b85c072

 Removed the if CALL_IDENTIFIER in garbage
 Removed the node_not_to_step_past.connect(first_node)
 Fixed up _get_inner_most_expression, connect_nodes and wrote the top 4 expr_cfg_tests
 # This is used in connect_nodes
 call_node.first_expression = connected_expressions.first_expression
 Fixed remaining expr_cfg_tests
 Left off BinOp for another day

@KevinHock KevinHock mentioned this pull request

Aug 23, 2018

Add a logger and verbosity options #168

Closed

@KevinHock KevinHock closed this

Apr 10, 2019

[WIP] Fix false-negatives and false-positives #99

[WIP] Fix false-negatives and false-positives #99

Uh oh!

Conversation

@KevinHock KevinHock commented Mar 29, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

@KevinHock KevinHock Apr 4, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

@KevinHock KevinHock Apr 4, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

@KevinHock KevinHock Apr 6, 2018

Choose a reason for hiding this comment

Uh oh!

KevinHock commented Apr 26, 2018

Uh oh!

KevinHock commented Jul 4, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

WHAT IS LEFT

Uh oh!

KevinHock commented Jul 7, 2018

Uh oh!

KevinHock commented Jul 26, 2018

HOW WE CURRENTLY SET THE INNER MOST CALL ATTRIBUTE FOR BLACKBOX CALLS:

HOW WE CURRENTLY USE THE INNER MOST CALL ATTRIBUTE:

Uh oh!

KevinHock commented Jul 26, 2018

OPEN QUESTIONS

Question

Answer

Question

Answer

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

@KevinHock KevinHock commented Mar 29, 2018 •

edited

Loading

@KevinHock KevinHock Apr 4, 2018 •

edited

Loading

@KevinHock KevinHock Apr 4, 2018 •

edited

Loading

KevinHock commented Jul 4, 2018 •

edited

Loading