[Python] Taint Analysis for a particular scenario involving StringIO · github/codeql · Discussion #20085

rikaardhosein
Jul 17, 2025

from io import StringIO
dangerous_sink = print
class Source:
 def __init__(self, stream: StringIO|None = None):
 self.stream = stream if stream is not None else StringIO()
 def read(self):
 return self.stream.getvalue()
 def writecontent(self, content):
 self.stream.write(content)
class Processor:
 def process(self, s: Source):
 s.stream.seek(0)
 source_code = s.stream.getvalue()
 dangerous_sink(test_global + "" + source_code)
test_global = "test_data" 
tainted_global = "tainted_data" 
 
if __name__ == "__main__":
 s = Source()
 c = Processor()
 s.writecontent(tainted_global)
 c.process(s)

My goal here is to track tainted_global (source) to the first argument to dangerous_sink.
I'm having a difficult time coming up with a way to do this.

Note: I don't want to just taint calls to Source.writecontent(actual_source_here) and then references to s since I'm trying to create something more generic.

My taint analysis reaches self.stream.write(content) in s.writecontent and by using IsAdditionalFlowStep, the taint flow reaches self.stream and self.

However, tainting self does not taint the instance of the class in the outer scope.
Tainting self.stream does not affect s.stream when it is referenced later on in Processor.process.

I'm not sure how I would go about tainting something in the enclosing scope since this method does not return anything.

Below is the current version of my isAdditionalFlowStep predicate in case it's useful.

 predicate isAdditionalFlowStep(DataFlow::Node pred, DataFlow::Node succ) {
 exists(DataFlow::MethodCallNode m|
 m.getMethodName() = "write" and
 m.getObject() = succ and
 pred = m.getArg(0) 
 )
 or
 exists(DataFlow::MethodCallNode m |
 m.getMethodName() = "write" and
 m.getObject() = pred
 and pred.asExpr().(Attribute).getObject() = succ.asExpr()
 )
 or
 exists(DataFlow::MethodCallNode m|
 m.getMethodName() in ["getvalue", "read"] and
 succ = m
 and pred = m.getObject()
 )
 }

Answered by smowton

Jul 18, 2025

You're very close!

exists(DataFlow::MethodCallNode m |
 m.getMethodName() = "write" and
 m.getObject() = pred
 and pred.asExpr().(Attribute).getObject() = succ.asExpr()
 )
 or

(the middle disjunct) is unnecessary and can be removed.

The top disjunct needs changing to use DataFlow::PostUpdateNode, which represents a side-effect on a function parameter (including self, as here):

 exists(DataFlow::MethodCallNode m |
 m.getMethodName() = "write" and
 m.getObject() = succ.(DataFlow::PostUpdateNode).getPreUpdateNode() and
 pred = m.getArg(0)
 )

You must also use TaintTracking because of the string-concatenation step at test_global + "" + source_code (...

View full answer

Replies: 1 comment 2 replies

smowton
Jul 18, 2025
Maintainer

You're very close!

exists(DataFlow::MethodCallNode m |
 m.getMethodName() = "write" and
 m.getObject() = pred
 and pred.asExpr().(Attribute).getObject() = succ.asExpr()
 )
 or

(the middle disjunct) is unnecessary and can be removed.

The top disjunct needs changing to use DataFlow::PostUpdateNode, which represents a side-effect on a function parameter (including self, as here):

 exists(DataFlow::MethodCallNode m |
 m.getMethodName() = "write" and
 m.getObject() = succ.(DataFlow::PostUpdateNode).getPreUpdateNode() and
 pred = m.getArg(0)
 )

You must also use TaintTracking because of the string-concatenation step at test_global + "" + source_code (basic DataFlow only follows value-preserving steps, and string concat doesn't preserve exact value; TaintTracking also includes steps that for most purposes preserve security hazard, like string joining, but not exact value)

Finally note these are overly broad models since any method named getvalue is assumed to behave this way -- it would be better to refine them to check that the LHS of the call may be a StringIO instance, but I'll ask the Python team to follow up with advice in this respect.

2 replies

@rikaardhosein

rikaardhosein Jul 18, 2025
Author

@smowton Thank you! I'll give that a try today. It does seem like what I'm looking for.

I am using TaintTracking 👍

I have a plan for refining isAdditionalFlowStep to only affect StringIO instances but I don't know if it's a good one.

I'm thinking to track the data flow from API::moduleImport("io").getMember("StringIO").getACall() with an always true sink predicate. I can then check if MethodCallNode.getObject()is one of those sinks.

I can see this not working universally though.

eg. Will the dataflow from self.stream = stream if stream is not None else StringIO() in the Source constructor reach s.stream in Processor.process?

@smowton

smowton Jul 30, 2025
Maintainer

Apologies for the slow response. Yes, this example is just complicated enough that a subsidiary dataflow config to identify probable StringIO method calls is necessary.

For example you can use

module StringIOCallConfig implements DataFlow::ConfigSig {
 predicate isSource(DataFlow::Node n) { n = API::moduleImport("io").getMember("StringIO").getReturn().asSource() }
 predicate isSink(DataFlow::Node n) {
 n = any(DataFlow::MethodCallNode mcn | mcn.getMethodName() = "write").getObject()
 }
}
module StringIOCallFlow = DataFlow::Global<StringIOCallConfig>;

Note that constraining the sink rather than looking for everywhere a StringIO instance can get to reduces the cost of the analysis.

And then in your main config, you can use

 exists(DataFlow::MethodCallNode m |
 Flow2::flowTo(m.getObject()) and
 ...

to characterise specifically write calls whose receiver may be a StringIO instance.

For the getvalue and read calls you could generalise this to

 predicate isSink(DataFlow::Node n) {
 n = any(DataFlow::MethodCallNode mcn | mcn.getMethodName() = ["getvalue", "read", "write"]).getObject()
 }

This would identify all interesting calls against a StringIO instance; then in your main config you can once again disambiguate whether the interesting call is a getvalue, read or write call and therefore what sort of step is appropriate.

Answer selected by rikaardhosein

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Python] Taint Analysis for a particular scenario involving StringIO #20085

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

rikaardhosein
Jul 17, 2025

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

smowton
Jul 18, 2025
Maintainer

Uh oh!

{{title}}

Uh oh!

rikaardhosein Jul 18, 2025
Author

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

smowton Jul 30, 2025
Maintainer

Select a reply

Uh oh!

[Python] Taint Analysis for a particular scenario involving StringIO #20085

Uh oh!

Uh oh!

rikaardhosein Jul 17, 2025

Replies: 1 comment · 2 replies

Uh oh!

smowton Jul 18, 2025 Maintainer

Uh oh!

rikaardhosein Jul 18, 2025 Author

Uh oh!

Uh oh!

smowton Jul 30, 2025 Maintainer

rikaardhosein
Jul 17, 2025

Replies: 1 comment 2 replies

smowton
Jul 18, 2025
Maintainer

rikaardhosein Jul 18, 2025
Author

smowton Jul 30, 2025
Maintainer