Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

[Python] Taint Analysis for a particular scenario involving StringIO #20085

Answered by smowton
rikaardhosein asked this question in Q&A
Discussion options

from io import StringIO
dangerous_sink = print
class Source:
 def __init__(self, stream: StringIO|None = None):
 self.stream = stream if stream is not None else StringIO()
 def read(self):
 return self.stream.getvalue()
 def writecontent(self, content):
 self.stream.write(content)
class Processor:
 def process(self, s: Source):
 s.stream.seek(0)
 source_code = s.stream.getvalue()
 dangerous_sink(test_global + "" + source_code)
test_global = "test_data" 
tainted_global = "tainted_data" 
 
if __name__ == "__main__":
 s = Source()
 c = Processor()
 s.writecontent(tainted_global)
 c.process(s)

My goal here is to track tainted_global (source) to the first argument to dangerous_sink.
I'm having a difficult time coming up with a way to do this.

Note: I don't want to just taint calls to Source.writecontent(actual_source_here) and then references to s since I'm trying to create something more generic.

My taint analysis reaches self.stream.write(content) in s.writecontent and by using IsAdditionalFlowStep, the taint flow reaches self.stream and self.

However, tainting self does not taint the instance of the class in the outer scope.
Tainting self.stream does not affect s.stream when it is referenced later on in Processor.process.

I'm not sure how I would go about tainting something in the enclosing scope since this method does not return anything.

Below is the current version of my isAdditionalFlowStep predicate in case it's useful.

 predicate isAdditionalFlowStep(DataFlow::Node pred, DataFlow::Node succ) {
 exists(DataFlow::MethodCallNode m|
 m.getMethodName() = "write" and
 m.getObject() = succ and
 pred = m.getArg(0) 
 )
 or
 exists(DataFlow::MethodCallNode m |
 m.getMethodName() = "write" and
 m.getObject() = pred
 and pred.asExpr().(Attribute).getObject() = succ.asExpr()
 )
 or
 exists(DataFlow::MethodCallNode m|
 m.getMethodName() in ["getvalue", "read"] and
 succ = m
 and pred = m.getObject()
 )
 }
You must be logged in to vote

You're very close!

exists(DataFlow::MethodCallNode m |
 m.getMethodName() = "write" and
 m.getObject() = pred
 and pred.asExpr().(Attribute).getObject() = succ.asExpr()
 )
 or

(the middle disjunct) is unnecessary and can be removed.

The top disjunct needs changing to use DataFlow::PostUpdateNode, which represents a side-effect on a function parameter (including self, as here):

 exists(DataFlow::MethodCallNode m |
 m.getMethodName() = "write" and
 m.getObject() = succ.(DataFlow::PostUpdateNode).getPreUpdateNode() and
 pred = m.getArg(0)
 )

You must also use TaintTracking because of the string-concatenation step at test_global + "" + source_code (...

Replies: 1 comment 2 replies

Comment options

You're very close!

exists(DataFlow::MethodCallNode m |
 m.getMethodName() = "write" and
 m.getObject() = pred
 and pred.asExpr().(Attribute).getObject() = succ.asExpr()
 )
 or

(the middle disjunct) is unnecessary and can be removed.

The top disjunct needs changing to use DataFlow::PostUpdateNode, which represents a side-effect on a function parameter (including self, as here):

 exists(DataFlow::MethodCallNode m |
 m.getMethodName() = "write" and
 m.getObject() = succ.(DataFlow::PostUpdateNode).getPreUpdateNode() and
 pred = m.getArg(0)
 )

You must also use TaintTracking because of the string-concatenation step at test_global + "" + source_code (basic DataFlow only follows value-preserving steps, and string concat doesn't preserve exact value; TaintTracking also includes steps that for most purposes preserve security hazard, like string joining, but not exact value)

Finally note these are overly broad models since any method named getvalue is assumed to behave this way -- it would be better to refine them to check that the LHS of the call may be a StringIO instance, but I'll ask the Python team to follow up with advice in this respect.

You must be logged in to vote
2 replies
Comment options

@smowton Thank you! I'll give that a try today. It does seem like what I'm looking for.

I am using TaintTracking 👍

I have a plan for refining isAdditionalFlowStep to only affect StringIO instances but I don't know if it's a good one.

I'm thinking to track the data flow from API::moduleImport("io").getMember("StringIO").getACall() with an always true sink predicate. I can then check if MethodCallNode.getObject()is one of those sinks.

I can see this not working universally though.

eg. Will the dataflow from self.stream = stream if stream is not None else StringIO() in the Source constructor reach s.stream in Processor.process?

Comment options

Apologies for the slow response. Yes, this example is just complicated enough that a subsidiary dataflow config to identify probable StringIO method calls is necessary.

For example you can use

module StringIOCallConfig implements DataFlow::ConfigSig {
 predicate isSource(DataFlow::Node n) { n = API::moduleImport("io").getMember("StringIO").getReturn().asSource() }
 predicate isSink(DataFlow::Node n) {
 n = any(DataFlow::MethodCallNode mcn | mcn.getMethodName() = "write").getObject()
 }
}
module StringIOCallFlow = DataFlow::Global<StringIOCallConfig>;

Note that constraining the sink rather than looking for everywhere a StringIO instance can get to reduces the cost of the analysis.

And then in your main config, you can use

 exists(DataFlow::MethodCallNode m |
 Flow2::flowTo(m.getObject()) and
 ...

to characterise specifically write calls whose receiver may be a StringIO instance.

For the getvalue and read calls you could generalise this to

 predicate isSink(DataFlow::Node n) {
 n = any(DataFlow::MethodCallNode mcn | mcn.getMethodName() = ["getvalue", "read", "write"]).getObject()
 }

This would identify all interesting calls against a StringIO instance; then in your main config you can once again disambiguate whether the interesting call is a getvalue, read or write call and therefore what sort of step is appropriate.

Answer selected by rikaardhosein
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

AltStyle によって変換されたページ (->オリジナル) /