[Python] Select existing path node (with flow state) in the isAdditionalFlowState predicate · github/codeql · Discussion #18853

jackfromeast
Feb 24, 2025

I am trying to track how many times the get operation is performed on the return object. However, my current CodeQL query is unable to correctly distinguish between different numbers of get operations.

The functions below demonstrate my expected flow states:

def test1(obj, key): # Source: Key, SourceKeyFlowState
 for _ in range(2):
 obj = obj.get(key) # if key has SourceKeyFlowState, obj.get(key) -> ObjectLayerOneFlowState 
 # if key has SourceKeyFlowState and obj has ObjectLayerOneFlowState, obj.get(key) -> ObjectLayerTwoFlowState
 return obj # Sink: obj, ObjectLayerTwoFlowState
def test2(obj, key): # Source: Key, SourceKeyFlowState
 for _ in range(1):
 obj = obj.get(key) # if key has SourceKeyFlowState, obj.get(key) -> ObjectLayerOneFlowState 
 # if key has SourceKeyFlowState and obj has ObjectLayerOneFlowState, obj.get(key) -> ObjectLayerTwoFlowState
 return obj # Sink: obj, ObjectLayerOneFlowState

Expected Taint Flows:

test1: SourceKeyFlowState → ObjectFlowState-MoreThanOne
test2: SourceKeyFlowState → ObjectFlowState-One

However, my current CodeQL query does not correctly distinguish between these cases and instead selects all possible four flows.

import python
import semmle.python.ApiGraphs
import codeql.dataflow.DataFlow
import semmle.python.dataflow.new.DataFlow
import semmle.python.dataflow.new.TaintTracking
import semmle.python.dataflow.new.internal.DataFlowPublic
import semmle.python.dataflow.new.internal.TaintTrackingPublic
module TempTest {
abstract class FlowState extends string {
 bindingset[this]
 FlowState() { any() }
}
class SourceKeyFlowState extends TempTest::FlowState {
 SourceKeyFlowState() { this = "SourceKeyFlowState" }
}
class Occurrence extends string {
 Occurrence() { 
 this = "One" or
 this = "MoreThanOne"
 }
}
class ObjectFlowState extends TempTest::FlowState {
 Occurrence occur;
 ObjectFlowState() { this = "ObjectFlowState" + "-" + occur }
}
module TrackingGetOperationConfiguration implements DataFlow::StateConfigSig {
 class FlowState = TempTest::FlowState;
 predicate isSource(DataFlow::Node source, FlowState state) {
 exists (Function func, Parameter param | 
 func.getArg(1) = param and
 source.asExpr() = param and
 func.getName().matches("test%")
 ) and
 (
 state instanceof SourceKeyFlowState
 )
 }
 predicate isSink(DataFlow::Node sink, FlowState state) {
 exists (Function func, Return ret | 
 ret.getScope() = func.getEvaluatingScope() and
 ret.contains(sink.asExpr()) and
 func.getName().matches("test%")
 ) and
 (
 state instanceof ObjectFlowState
 )
 }
 predicate isAdditionalFlowStep(DataFlow::Node fromNode, FlowState fromState, DataFlow::Node toNode, FlowState toState) {
 exists( MethodCallNode callNode, Call call |
 callNode.asExpr() = call and
 callNode.getMethodName() = "get" and
 callNode.asExpr() = toNode.asExpr() and
 call.getArg(0) = fromNode.asExpr()
 ) and 
 fromState instanceof SourceKeyFlowState and
 toState instanceof ObjectFlowState
 }
}
module TrackingGetOperationFlow = DataFlow::GlobalWithState<TrackingGetOperationConfiguration>;
module Flow = TrackingGetOperationFlow; // For shortening the name
predicate run(Flow::PathNode source, Flow::PathNode sink, FlowState state) {
 Flow::flowPath(source, sink) and
 sink.getState() = state
}
}

I think the key problem here is that in the isAdditionalFlowStep predicate, I cannot select the path node beside the fromNode and toNode, and use its flow state information to determine the flow state of toNode. In my case, the taint propagation step should take into account both the flow state of the key (fromNode) and the flow state of the base object to correctly determine the flow state of toNode.

Could anyone provide suggestions on how to fix this issue? Any insights would be greatly appreciated!

Replies: 3 comments 1 reply

superboy-zjc
Feb 24, 2025

I have the same question!

0 replies

aschackmull
Feb 25, 2025
Maintainer

You cannot inspect flow to both obj and key at the same time in order to determine the output state at obj.get(key) in that way - this is because the calculated flow is a path that goes from one node to another one step at a time. However, you can do something that's close and might actually get you what you want: You can use two additional steps: One from key to the output that updates the state from SourceKeyFlowState to ObjectFlowState-One and one from the qualifier obj to the output that updates the state from ObjectFlowState-One to ObjectFlowState-MoreThanOne. Your current step definition mostly achieves the former, except that you've given the resulting state as toState instanceof ObjectFlowState which includes both One and MoreThanOne. You probably want to change that to state = "ObjectFlowState-One". And then supplement with the other step. The two steps might look something like this:

 predicate isAdditionalFlowStep(DataFlow::Node fromNode, FlowState fromState, DataFlow::Node toNode, FlowState toState) {
 exists( MethodCallNode callNode, Call call |
 callNode.asExpr() = call and
 callNode.getMethodName() = "get" and
 callNode.asExpr() = toNode.asExpr()
 |
 call.getArg(0) = fromNode.asExpr() and
 fromState instanceof SourceKeyFlowState and
 toState = "ObjectFlowState-One"
 or
 call.getQualifier() = fromNode.asExpr() and
 fromState = "ObjectFlowState-One" and
 toState = "ObjectFlowState-MoreThanOne"
 )
 }

Note that you cannot expect the out-of-the-box analysis to distinguish for _ in range(1): and for _ in range(2):. Determining the number of times a loop is executed in general is halting-problem territory.

0 replies

jackfromeast
Feb 25, 2025
Author

Hi @aschackmull! Thank you for your answer!

However, I’m still trying to differentiate between the following two cases:

def test1(obj, key): # Source: Key, SourceKeyFlowState
 for _ in range(2):
 obj = obj.get(key) # if key has SourceKeyFlowState, obj.get(key) -> ObjectLayerOneFlowState 
 # if key has SourceKeyFlowState and obj has ObjectLayerOneFlowState, obj.get(key) -> ObjectLayerTwoFlowState
 return obj # Sink: obj, ObjectLayerTwoFlowState
def test3(obj, key): # Source: Key, SourceKeyFlowState
 key2 = "x" # Not Source Key
 for _ in range(1):
 obj = obj.get(key) # if key has SourceKeyFlowState, obj.get(key) -> ObjectLayerOneFlowState 
 # if key has SourceKeyFlowState and obj has ObjectLayerOneFlowState, obj.get(key) -> ObjectLayerTwoFlowState
 obj = obj.get(key2) # Should not propagate the taint flow as key2 is not tainted with SourceKeyFlowState
 return obj # Sink: obj, ObjectLayerOneFlowState

If I add two additional taint flow steps for the get operation—one propagating from key to val, and the other from obj to val—I cannot make sure whether they refer to the same get operation. Also, please ignore the for loop times, it is just for demonstration.

If this (propagating taint flow based on multiple nodes' taint flow state in the isAdditionalFlowStep) is challenging for CodeQL to achieve, do you think we could support this in the future?

1 reply

@aschackmull

aschackmull Feb 26, 2025
Maintainer

It's perfectly doable, you just need a slightly different setup. Let's consider your step here:

obj = obj.get(key) # if key has SourceKeyFlowState and obj has ObjectLayerOneFlowState, obj.get(key) -> ObjectLayerTwoFlowState

If you're just tracking flow via a single path then you're either stepping from obj or from key, so you can't rely on state propagation from both. However, you could do an initial flow calculation that only deals with the propagation of source keys. Then once that's done you can restrict the additional steps in your second flow calculation to those qualifiers in obj.get(key) for which key has flow in the first flow calculation. You can use FlowState in your second flow definition to track the One vs MoreThanOne, but you probably don't need FlowState in the first - there you just need to find those get call for which the argument comes from a key source.

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Python] Select existing path node (with flow state) in the isAdditionalFlowState predicate #18853

Uh oh!

{{title}}

Uh oh!

jackfromeast
Feb 24, 2025

Replies: 3 comments 1 reply

Uh oh!

{{title}}

Uh oh!

superboy-zjc
Feb 24, 2025

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

aschackmull
Feb 25, 2025
Maintainer

Uh oh!

{{title}}

Uh oh!

jackfromeast
Feb 25, 2025
Author

Uh oh!

{{title}}

Uh oh!

aschackmull Feb 26, 2025
Maintainer

Select a reply

Uh oh!

[Python] Select existing path node (with flow state) in the isAdditionalFlowState predicate #18853

Uh oh!

jackfromeast Feb 24, 2025

Replies: 3 comments · 1 reply

Uh oh!

superboy-zjc Feb 24, 2025

Uh oh!

Uh oh!

aschackmull Feb 25, 2025 Maintainer

Uh oh!

jackfromeast Feb 25, 2025 Author

Uh oh!

aschackmull Feb 26, 2025 Maintainer

jackfromeast
Feb 24, 2025

Replies: 3 comments 1 reply

superboy-zjc
Feb 24, 2025

aschackmull
Feb 25, 2025
Maintainer

jackfromeast
Feb 25, 2025
Author

aschackmull Feb 26, 2025
Maintainer