-
Notifications
You must be signed in to change notification settings - Fork 1.9k
[Python] Taint Analysis for a particular scenario involving StringIO #20085
-
from io import StringIO dangerous_sink = print class Source: def __init__(self, stream: StringIO|None = None): self.stream = stream if stream is not None else StringIO() def read(self): return self.stream.getvalue() def writecontent(self, content): self.stream.write(content) class Processor: def process(self, s: Source): s.stream.seek(0) source_code = s.stream.getvalue() dangerous_sink(test_global + "" + source_code) test_global = "test_data" tainted_global = "tainted_data" if __name__ == "__main__": s = Source() c = Processor() s.writecontent(tainted_global) c.process(s)
My goal here is to track tainted_global (source) to the first argument to dangerous_sink.
I'm having a difficult time coming up with a way to do this.
Note: I don't want to just taint calls to Source.writecontent(actual_source_here) and then references to s since I'm trying to create something more generic.
My taint analysis reaches self.stream.write(content) in s.writecontent and by using IsAdditionalFlowStep, the taint flow reaches self.stream and self.
However, tainting self does not taint the instance of the class in the outer scope.
Tainting self.stream does not affect s.stream when it is referenced later on in Processor.process.
I'm not sure how I would go about tainting something in the enclosing scope since this method does not return anything.
Below is the current version of my isAdditionalFlowStep predicate in case it's useful.
predicate isAdditionalFlowStep(DataFlow::Node pred, DataFlow::Node succ) { exists(DataFlow::MethodCallNode m| m.getMethodName() = "write" and m.getObject() = succ and pred = m.getArg(0) ) or exists(DataFlow::MethodCallNode m | m.getMethodName() = "write" and m.getObject() = pred and pred.asExpr().(Attribute).getObject() = succ.asExpr() ) or exists(DataFlow::MethodCallNode m| m.getMethodName() in ["getvalue", "read"] and succ = m and pred = m.getObject() ) }
Beta Was this translation helpful? Give feedback.
All reactions
You're very close!
exists(DataFlow::MethodCallNode m |
m.getMethodName() = "write" and
m.getObject() = pred
and pred.asExpr().(Attribute).getObject() = succ.asExpr()
)
or
(the middle disjunct) is unnecessary and can be removed.
The top disjunct needs changing to use DataFlow::PostUpdateNode, which represents a side-effect on a function parameter (including self, as here):
exists(DataFlow::MethodCallNode m |
m.getMethodName() = "write" and
m.getObject() = succ.(DataFlow::PostUpdateNode).getPreUpdateNode() and
pred = m.getArg(0)
)
You must also use TaintTracking because of the string-concatenation step at test_global + "" + source_code (...
Replies: 1 comment 2 replies
-
You're very close!
exists(DataFlow::MethodCallNode m |
m.getMethodName() = "write" and
m.getObject() = pred
and pred.asExpr().(Attribute).getObject() = succ.asExpr()
)
or
(the middle disjunct) is unnecessary and can be removed.
The top disjunct needs changing to use DataFlow::PostUpdateNode, which represents a side-effect on a function parameter (including self, as here):
exists(DataFlow::MethodCallNode m |
m.getMethodName() = "write" and
m.getObject() = succ.(DataFlow::PostUpdateNode).getPreUpdateNode() and
pred = m.getArg(0)
)
You must also use TaintTracking because of the string-concatenation step at test_global + "" + source_code (basic DataFlow only follows value-preserving steps, and string concat doesn't preserve exact value; TaintTracking also includes steps that for most purposes preserve security hazard, like string joining, but not exact value)
Finally note these are overly broad models since any method named getvalue is assumed to behave this way -- it would be better to refine them to check that the LHS of the call may be a StringIO instance, but I'll ask the Python team to follow up with advice in this respect.
Beta Was this translation helpful? Give feedback.
All reactions
-
@smowton Thank you! I'll give that a try today. It does seem like what I'm looking for.
I am using TaintTracking 👍
I have a plan for refining isAdditionalFlowStep to only affect StringIO instances but I don't know if it's a good one.
I'm thinking to track the data flow from API::moduleImport("io").getMember("StringIO").getACall() with an always true sink predicate. I can then check if MethodCallNode.getObject()is one of those sinks.
I can see this not working universally though.
eg. Will the dataflow from self.stream = stream if stream is not None else StringIO() in the Source constructor reach s.stream in Processor.process?
Beta Was this translation helpful? Give feedback.
All reactions
-
Apologies for the slow response. Yes, this example is just complicated enough that a subsidiary dataflow config to identify probable StringIO method calls is necessary.
For example you can use
module StringIOCallConfig implements DataFlow::ConfigSig { predicate isSource(DataFlow::Node n) { n = API::moduleImport("io").getMember("StringIO").getReturn().asSource() } predicate isSink(DataFlow::Node n) { n = any(DataFlow::MethodCallNode mcn | mcn.getMethodName() = "write").getObject() } } module StringIOCallFlow = DataFlow::Global<StringIOCallConfig>;
Note that constraining the sink rather than looking for everywhere a StringIO instance can get to reduces the cost of the analysis.
And then in your main config, you can use
exists(DataFlow::MethodCallNode m | Flow2::flowTo(m.getObject()) and ...
to characterise specifically write calls whose receiver may be a StringIO instance.
For the getvalue and read calls you could generalise this to
predicate isSink(DataFlow::Node n) { n = any(DataFlow::MethodCallNode mcn | mcn.getMethodName() = ["getvalue", "read", "write"]).getObject() }
This would identify all interesting calls against a StringIO instance; then in your main config you can once again disambiguate whether the interesting call is a getvalue, read or write call and therefore what sort of step is appropriate.
Beta Was this translation helpful? Give feedback.