-
Notifications
You must be signed in to change notification settings - Fork 1.9k
-
Hello,
While I was writting some test file to get a better understanding of Dataflow true capabilities at tracking values with function and containers wrapping, with uses of references, pointers and other edge cases I encountered something which I cannot make sens of. An assignment to a reference is correctly detected as a source that flows to the sink, but it doesn't overwrite previous assignment. (that is to say, the value assigned before the assignment to the reference is said to flows to the sink)
This is obviously wrong (I mean I hope so otherwise I would seriously reconsider my understanding of c++), but the hard part (from a computational / algorithmic point of view) is finding the assignement to a reference hidden inside the function, not overwriting the value that was assigned before the function. So I have no idea why CodeQL says that the nullptr flows to sink. We can simply execute the program to see that indeed, nullptr doesn't flows to triggerSinkPtr. (or maybe my understanding of what it means to "flow" is wrong ?)
Here's the code :
void setToValid(TwoIntClass*& ref){ TwoIntClass* valid = new TwoIntClass{1,2}; ref = valid; } //... void refarg_good(){ TwoIntClass* ptr = nullptr; setToValid(ptr); ptr->triggerSinkPtr(); free(ptr); }
Here's the CodeQL request :
/** * @name global DF2 * @id tmp * @kind problem * @problem.severity recommendation * @security-severity 7.5 */ import cpp import semmle.code.cpp.dataflow.new.DataFlow module EnvironmentToFileConfiguration implements DataFlow::ConfigSig { predicate isSource(DataFlow::Node source) { any() //source.asIndirectExpr(1).isRValueCategory() } predicate isSink(DataFlow::Node sink) { exists(FunctionCall fc | sink.asIndirectExpr(1) = fc.getQualifier() and fc.getTarget().getName() = "triggerSinkPtr" ) } } module CustomFlow = DataFlow::Global<EnvironmentToFileConfiguration>; from Expr srcExpr, Expr sinkExpr, DataFlow::Node source, DataFlow::Node sink, Function f where source.asIndirectExpr(1) = srcExpr and sink.asIndirectExpr(1) = sinkExpr and f = sinkExpr.getEnclosingFunction() and CustomFlow::flow(source, sink) select sinkExpr, "In $@ :\t $@ \tflows to \t$@.", f, f.getName(), srcExpr, srcExpr.toString(), sinkExpr, sinkExpr.toString()
And here's the result for this function : (0 correspond to nullptr and new to new TwoIntClass{1,2})
image
If the new is correctly detected then why is the nullptr still here ?
I am even more confused because in this case, only nullptr is said to flow to the sink by the request (and obviously it's not)
void ref_good(){ TwoIntClass tic = {1,2}; TwoIntClass* ptr = nullptr; TwoIntClass*& refptr = ptr; refptr = &tic; ptr->triggerSinkPtr(); }
Does this mean that Dataflow track argument references but not references ? I would've expected reference in the same scope easier to track than reference in function argument so I'm a little surprised.
Beta Was this translation helpful? Give feedback.
All reactions
Replies: 1 comment 1 reply
-
Hi @ayosten,
Thanks for raising these issues.
The problem here is that CodeQL doesn't reason about "interprocedural" barriers. That is, we don't detect that the value is overwritten in all cases when flowing through setToValid. For example, if setToValid had been implemented like:
bool condition(); void setToValid(TwoIntClass *&ref) { TwoIntClass *valid = new TwoIntClass{1, 2}; if(condition()) { ref = valid; } }
so that setToValid didn't always guarentee that ref was overwirtten upon return, then the flow from nullptr to triggerSinkPtr would be correct.
This is also why you don't get an alert from source() to sink in:
int source(); void sink(int); void test() { int x = source(); x = 0; sink(x); }
but you do get one here:
int source(); void sink(int); void set_x(int *x) { *x = 0; } void test() { int x = source(); set_x(&x); sink(x); }
since we don't try to detect that set_x always overwrites the value of x.
As for your final example:
void ref_good(){ TwoIntClass tic = {1,2}; TwoIntClass* ptr = nullptr; TwoIntClass*& refptr = ptr; refptr = &tic; ptr->triggerSinkPtr(); }
we don't realise that refptr is a reassignment to ptr because dataflow currently doesn't know about any alias relationships. This is something we want to fix eventually, but so far we haven't managed to implement it in a way that doesn't cause too many performance regressions.
I will keep a link to this issue in our internal tracking for both of these issues, though.
Beta Was this translation helpful? Give feedback.
All reactions
-
Hello @MathiasVP , thank you again for your answer !
While I do understand that tracking references is computationally expensive because it requires additional data structure that stores all references to a variable (I've wrote predicate that reproduce dataflow considering references in local scope with an implicit creation of path using recursive predicate that navigate one ControlFlowNode at a time, and it wasn't pretty even though it worked), and that it's somewhat easier to find reference argument, I'm still a little confuse to why this isn't considered as a barrier.
The fact that CodeQL found the re-assignment inside the function means that it must have reach it while navigating through the Control Flow Graph, and thus it knows (for (I suppose) barely no computational cost) that it is not inside a conditional statement (that is to say, the basic bloc which contains ref = valid; will always be executed). Thus I expected to work correctly the same way that in the following code, only ticC flows to the sink. Am I mistaken in the idea that Dataflow library explore path in the Node graph ?
ptr = &ticB; ptr = nullptr; ptr = &ticC; ptr->triggerSinkPtr();
Also, I found something quite inconsistent that I cannot explain rationnaly. As pointers and references are more or less the same thing (actually I don't know much about it) I expected to get the same result with CodeQL dataflow whether I use pointers or references. And indeed, modifying a value in local scope by creating a pointer to it isn't detected, the same way that declaring a reference in a local scope and changing it isn't detected as you explained CodeQL doesn't consider alias.
But then, I tried this :
void ptrSetMultipleValid(TwoIntClass** pptr){ *pptr = nullptr; *pptr = new TwoIntClass(1,2); *pptr = new TwoIntClass(3,4); } void adv_ptr_arg_good(){ TwoIntClass* ptr = nullptr; ptrSetMultipleValid(&ptr); ptr->triggerSinkPtr(); free(ptr); }
However :
void setToMultipleValid(TwoIntClass*& ref){ TwoIntClass* valid; valid = nullptr; valid = new TwoIntClass{1,2}; //incidental memory leak valid = new TwoIntClass{3,4}; ref = valid; } void adv_ref_arg_good(){ TwoIntClass* ptr = nullptr; setToMultipleValid(ptr); ptr->triggerSinkPtr(); free(ptr); }
When passing a pointer to a pointer in argument :
- the initially assigned
nullptris detected (which make sens from your explanation) - All assignment to the dereferenced
pptrare detected :nullptr/new TwoIntClass(1,2)/new TwoIntClass(3,4)Which I can also make sens of because indeed if there where conditional branching inptrSetMultipleValidwe aren't able to guarantee which one are executed.
However, when passing a reference argument :
- the initially assigned
nullptris detected (same as before) - however, only the last
new TwoIntClass{3,4}is detected ! And the assignmentvalid = nullptr;andvalid = new TwoIntClass{1,2}are correctly ignored
(I wondered if this was because of the intermediate ref = valid; ? I just tested and this has in fact nothing to do with the intermediate ref=valid, so this prove that reference arguments and pointer arguments are handle differently by CodeQL. As they're basically the same I don't see why they should be treated differently)
void setToMultipleValidBIS(TwoIntClass*& ref){ cout << "seting ptr to a valid value, no intermediate"<<endl; ref = nullptr; ref = new TwoIntClass{1,2}; //incidental memory leak ref = new TwoIntClass{3,4}; } void adv_ref_arg_goodBIS(){ TwoIntClass* ptr = nullptr; setToMultipleValidBIS(ptr); ptr->triggerSinkPtr(); free(ptr); }
(only the latest / guaranteed execution ref = new TwoIntClass(3,4) is detected to flow to the sink.
I will now run some test about conditional branching, any key concept I should keep in mind before trying ?
Beta Was this translation helpful? Give feedback.