-
Notifications
You must be signed in to change notification settings - Fork 1.9k
-
Hi all, I'd greatly appreciate your input here. I'm working on C code analysis and would like to use Range Analysis to detect potential out of bounds accesses in (mostly) static arrays. While I can easily use the lowerBound and upperBound predicates, those two only work as an intraprocedural analysis and thus I can't find a way to propagate the ranges to the callees. Example:
I'm using this simple query to just print the ranges of the VariableAccesses we have in the code:
import cpp import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis from VariableAccess va select lowerBound(va.getFullyConverted()), upperBound(va.getFullyConverted()), va
In the C code I need to analyse, SimpleRangeAnalysis can infer that the range is [0, 4] in the VariableAccess that happens in the first branch of function foo for the variable index, and can infer that the range is [4, 11] in the second branch for the VariableAccess that happens on the same variable index. However, when the analysis tries to resolve the ranges of the VariableAccesses for index inside the bar and bar2 functions, it can't propagate the ranges computed in the caller and thus the ranges I get are [-2,147,483,648, +2,147,483,647] in those two leaf functions. Is there anything already implemented (even experimental) to solve this simple scenario? What am I missing ? Thanks.
#include <stdio.h> #include <stdlib.h> char bar(int index, char buffer[]) { return buffer[index]; // lowerBound == -2,147,483,648 && upperBound == 2,147,483,647 } char bar2(int index, char buffer[]) { return buffer[index - 1]; // lowerBound == -2,147,483,648 && upperBound == 2,147,483,647 } char foo(int index, char buffer[]) { if (index >= 0 && index <= 4) { return bar(index, buffer); // lowerBound == 0 && upperBound == 4 } else if (index > 4 && index <= 11) { return bar2(index, buffer); // lowerBound == 5 && upperBound == 11 } return '0円'; } int main(int argc, char *argv[]) { int index = atoi(argv[1]); char buffer[10] = {'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'}; char result = foo(index, buffer); return 0; }`
Beta Was this translation helpful? Give feedback.
All reactions
Hi @elManto 👋🏻
Thanks for the question!
Indeed, SimpleRangeAnalysis only supports intra-procedural range analysis. I am afraid that we also don't have anything off-the-shelf for inter-procedural range analysis. This is partially because, e.g. in your code, it can be difficult to tell whether bar and bar2 are only ever called from foo.
If you are willing to do so, it is possible to extend SimpleRangeAnalysis with custom bounds for things by extending the SimpleRangeAnalysisExpr class. For example, see CustomAddFunctionCall in this directory. You could use this approach to model bounds for particular functions, or even to link up the ranges for arguments to a call with the parameters of a f...
Replies: 1 comment 2 replies
-
Hi @elManto 👋🏻
Thanks for the question!
Indeed, SimpleRangeAnalysis only supports intra-procedural range analysis. I am afraid that we also don't have anything off-the-shelf for inter-procedural range analysis. This is partially because, e.g. in your code, it can be difficult to tell whether bar and bar2 are only ever called from foo.
If you are willing to do so, it is possible to extend SimpleRangeAnalysis with custom bounds for things by extending the SimpleRangeAnalysisExpr class. For example, see CustomAddFunctionCall in this directory. You could use this approach to model bounds for particular functions, or even to link up the ranges for arguments to a call with the parameters of a function.
Beta Was this translation helpful? Give feedback.
All reactions
-
Thanks @mbg ! Do you think that function cloning would partially help here? I mean, if we clone bar and bar2 we would be able to link each clone to its corresponding call site and thus have a more precise range analysis. If yes, do you think I can implement this in CodeQL ? I was looking at the IR files but I don't know exactly the internals.
Anyway thanks for the other pointer you shared, greatly appreciated.
Beta Was this translation helpful? Give feedback.
All reactions
-
I mean, if we clone bar and bar2 we would be able to link each clone to its corresponding call site and thus have a more precise range analysis. If yes, do you think I can implement this in CodeQL ? I was looking at the IR files but I don't know exactly the internals.
I suspect that this would be tricky, but it's difficult for me to say without doing a bunch of work to investigate different approaches to implementing it. In any case, I would expect it to be quite an involved task to find an implementation that doesn't break some other part of the C/C++ analysis or leads to unexpected behaviours.
Even if you do get the "function cloning" to work, you may then run into performance issues outside of toy codebases since you may have potentially many "clones" of a given function, which in turn may lead to further clones of other functions, etc.
The approach based on using SimpleRangeAnalysisExpr to link up bounds from call sites to callees and taking the maximum/minimum across all call-site bounds is probably easier and more performant, though less accurate.
Beta Was this translation helpful? Give feedback.