-
Notifications
You must be signed in to change notification settings - Fork 1.9k
[Python] Select existing path node (with flow state) in the isAdditionalFlowState predicate #18853
-
I am trying to track how many times the get operation is performed on the return object. However, my current CodeQL query is unable to correctly distinguish between different numbers of get operations.
The functions below demonstrate my expected flow states:
def test1(obj, key): # Source: Key, SourceKeyFlowState
for _ in range(2):
obj = obj.get(key) # if key has SourceKeyFlowState, obj.get(key) -> ObjectLayerOneFlowState
# if key has SourceKeyFlowState and obj has ObjectLayerOneFlowState, obj.get(key) -> ObjectLayerTwoFlowState
return obj # Sink: obj, ObjectLayerTwoFlowState
def test2(obj, key): # Source: Key, SourceKeyFlowState
for _ in range(1):
obj = obj.get(key) # if key has SourceKeyFlowState, obj.get(key) -> ObjectLayerOneFlowState
# if key has SourceKeyFlowState and obj has ObjectLayerOneFlowState, obj.get(key) -> ObjectLayerTwoFlowState
return obj # Sink: obj, ObjectLayerOneFlowState
Expected Taint Flows:
test1:SourceKeyFlowState→ObjectFlowState-MoreThanOnetest2:SourceKeyFlowState→ObjectFlowState-One
However, my current CodeQL query does not correctly distinguish between these cases and instead selects all possible four flows.
import python
import semmle.python.ApiGraphs
import codeql.dataflow.DataFlow
import semmle.python.dataflow.new.DataFlow
import semmle.python.dataflow.new.TaintTracking
import semmle.python.dataflow.new.internal.DataFlowPublic
import semmle.python.dataflow.new.internal.TaintTrackingPublic
module TempTest {
abstract class FlowState extends string {
bindingset[this]
FlowState() { any() }
}
class SourceKeyFlowState extends TempTest::FlowState {
SourceKeyFlowState() { this = "SourceKeyFlowState" }
}
class Occurrence extends string {
Occurrence() {
this = "One" or
this = "MoreThanOne"
}
}
class ObjectFlowState extends TempTest::FlowState {
Occurrence occur;
ObjectFlowState() { this = "ObjectFlowState" + "-" + occur }
}
module TrackingGetOperationConfiguration implements DataFlow::StateConfigSig {
class FlowState = TempTest::FlowState;
predicate isSource(DataFlow::Node source, FlowState state) {
exists (Function func, Parameter param |
func.getArg(1) = param and
source.asExpr() = param and
func.getName().matches("test%")
) and
(
state instanceof SourceKeyFlowState
)
}
predicate isSink(DataFlow::Node sink, FlowState state) {
exists (Function func, Return ret |
ret.getScope() = func.getEvaluatingScope() and
ret.contains(sink.asExpr()) and
func.getName().matches("test%")
) and
(
state instanceof ObjectFlowState
)
}
predicate isAdditionalFlowStep(DataFlow::Node fromNode, FlowState fromState, DataFlow::Node toNode, FlowState toState) {
exists( MethodCallNode callNode, Call call |
callNode.asExpr() = call and
callNode.getMethodName() = "get" and
callNode.asExpr() = toNode.asExpr() and
call.getArg(0) = fromNode.asExpr()
) and
fromState instanceof SourceKeyFlowState and
toState instanceof ObjectFlowState
}
}
module TrackingGetOperationFlow = DataFlow::GlobalWithState<TrackingGetOperationConfiguration>;
module Flow = TrackingGetOperationFlow; // For shortening the name
predicate run(Flow::PathNode source, Flow::PathNode sink, FlowState state) {
Flow::flowPath(source, sink) and
sink.getState() = state
}
}
I think the key problem here is that in the isAdditionalFlowStep predicate, I cannot select the path node beside the fromNode and toNode, and use its flow state information to determine the flow state of toNode. In my case, the taint propagation step should take into account both the flow state of the key (fromNode) and the flow state of the base object to correctly determine the flow state of toNode.
Could anyone provide suggestions on how to fix this issue? Any insights would be greatly appreciated!
Beta Was this translation helpful? Give feedback.
All reactions
-
👀 1
Replies: 3 comments 1 reply
-
I have the same question!
Beta Was this translation helpful? Give feedback.
All reactions
-
You cannot inspect flow to both obj and key at the same time in order to determine the output state at obj.get(key) in that way - this is because the calculated flow is a path that goes from one node to another one step at a time. However, you can do something that's close and might actually get you what you want: You can use two additional steps: One from key to the output that updates the state from SourceKeyFlowState to ObjectFlowState-One and one from the qualifier obj to the output that updates the state from ObjectFlowState-One to ObjectFlowState-MoreThanOne. Your current step definition mostly achieves the former, except that you've given the resulting state as toState instanceof ObjectFlowState which includes both One and MoreThanOne. You probably want to change that to state = "ObjectFlowState-One". And then supplement with the other step. The two steps might look something like this:
predicate isAdditionalFlowStep(DataFlow::Node fromNode, FlowState fromState, DataFlow::Node toNode, FlowState toState) {
exists( MethodCallNode callNode, Call call |
callNode.asExpr() = call and
callNode.getMethodName() = "get" and
callNode.asExpr() = toNode.asExpr()
|
call.getArg(0) = fromNode.asExpr() and
fromState instanceof SourceKeyFlowState and
toState = "ObjectFlowState-One"
or
call.getQualifier() = fromNode.asExpr() and
fromState = "ObjectFlowState-One" and
toState = "ObjectFlowState-MoreThanOne"
)
}
Note that you cannot expect the out-of-the-box analysis to distinguish for _ in range(1): and for _ in range(2):. Determining the number of times a loop is executed in general is halting-problem territory.
Beta Was this translation helpful? Give feedback.
All reactions
-
Hi @aschackmull! Thank you for your answer!
However, I’m still trying to differentiate between the following two cases:
def test1(obj, key): # Source: Key, SourceKeyFlowState
for _ in range(2):
obj = obj.get(key) # if key has SourceKeyFlowState, obj.get(key) -> ObjectLayerOneFlowState
# if key has SourceKeyFlowState and obj has ObjectLayerOneFlowState, obj.get(key) -> ObjectLayerTwoFlowState
return obj # Sink: obj, ObjectLayerTwoFlowState
def test3(obj, key): # Source: Key, SourceKeyFlowState
key2 = "x" # Not Source Key
for _ in range(1):
obj = obj.get(key) # if key has SourceKeyFlowState, obj.get(key) -> ObjectLayerOneFlowState
# if key has SourceKeyFlowState and obj has ObjectLayerOneFlowState, obj.get(key) -> ObjectLayerTwoFlowState
obj = obj.get(key2) # Should not propagate the taint flow as key2 is not tainted with SourceKeyFlowState
return obj # Sink: obj, ObjectLayerOneFlowState
If I add two additional taint flow steps for the get operation—one propagating from key to val, and the other from obj to val—I cannot make sure whether they refer to the same get operation. Also, please ignore the for loop times, it is just for demonstration.
If this (propagating taint flow based on multiple nodes' taint flow state in the isAdditionalFlowStep) is challenging for CodeQL to achieve, do you think we could support this in the future?
Beta Was this translation helpful? Give feedback.
All reactions
-
It's perfectly doable, you just need a slightly different setup. Let's consider your step here:
obj = obj.get(key) # if key has SourceKeyFlowState and obj has ObjectLayerOneFlowState, obj.get(key) -> ObjectLayerTwoFlowState
If you're just tracking flow via a single path then you're either stepping from obj or from key, so you can't rely on state propagation from both. However, you could do an initial flow calculation that only deals with the propagation of source keys. Then once that's done you can restrict the additional steps in your second flow calculation to those qualifiers in obj.get(key) for which key has flow in the first flow calculation. You can use FlowState in your second flow definition to track the One vs MoreThanOne, but you probably don't need FlowState in the first - there you just need to find those get call for which the argument comes from a key source.
Beta Was this translation helpful? Give feedback.
All reactions
-
❤️ 1