Interprocedural Range Analysis · github/codeql · Discussion #17425

elManto
Sep 10, 2024

Hi all, I'd greatly appreciate your input here. I'm working on C code analysis and would like to use Range Analysis to detect potential out of bounds accesses in (mostly) static arrays. While I can easily use the lowerBound and upperBound predicates, those two only work as an intraprocedural analysis and thus I can't find a way to propagate the ranges to the callees. Example:

I'm using this simple query to just print the ranges of the VariableAccesses we have in the code:

import cpp
import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis
from VariableAccess va
select lowerBound(va.getFullyConverted()), upperBound(va.getFullyConverted()), va

In the C code I need to analyse, SimpleRangeAnalysis can infer that the range is [0, 4] in the VariableAccess that happens in the first branch of function foo for the variable index, and can infer that the range is [4, 11] in the second branch for the VariableAccess that happens on the same variable index. However, when the analysis tries to resolve the ranges of the VariableAccesses for index inside the bar and bar2 functions, it can't propagate the ranges computed in the caller and thus the ranges I get are [-2,147,483,648, +2,147,483,647] in those two leaf functions. Is there anything already implemented (even experimental) to solve this simple scenario? What am I missing ? Thanks.

#include <stdio.h>
#include <stdlib.h>
char bar(int index, char buffer[]) {
	return buffer[index]; // lowerBound == -2,147,483,648 && upperBound == 2,147,483,647
}
char bar2(int index, char buffer[]) {
	return buffer[index - 1]; // lowerBound == -2,147,483,648 && upperBound == 2,147,483,647
}
char foo(int index, char buffer[]) {
 if (index >= 0 && index <= 4) {
 return bar(index, buffer); // lowerBound == 0 && upperBound == 4
 }
 else if (index > 4 && index <= 11) {
	return bar2(index, buffer); // lowerBound == 5 && upperBound == 11
 }
 return '0円';
}
int main(int argc, char *argv[]) {
 int index = atoi(argv[1]);
 char buffer[10] = {'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'};
 char result = foo(index, buffer);
 return 0;
}`

Answered by mbg

Sep 10, 2024

Hi @elManto 👋🏻

Thanks for the question!

Indeed, SimpleRangeAnalysis only supports intra-procedural range analysis. I am afraid that we also don't have anything off-the-shelf for inter-procedural range analysis. This is partially because, e.g. in your code, it can be difficult to tell whether bar and bar2 are only ever called from foo.

If you are willing to do so, it is possible to extend SimpleRangeAnalysis with custom bounds for things by extending the SimpleRangeAnalysisExpr class. For example, see CustomAddFunctionCall in this directory. You could use this approach to model bounds for particular functions, or even to link up the ranges for arguments to a call with the parameters of a f...

View full answer

Replies: 1 comment 2 replies

mbg
Sep 10, 2024
Maintainer

Hi @elManto 👋🏻

Thanks for the question!

2 replies

@elManto

elManto Sep 11, 2024
Author

Thanks @mbg ! Do you think that function cloning would partially help here? I mean, if we clone bar and bar2 we would be able to link each clone to its corresponding call site and thus have a more precise range analysis. If yes, do you think I can implement this in CodeQL ? I was looking at the IR files but I don't know exactly the internals.

Anyway thanks for the other pointer you shared, greatly appreciated.

@mbg

mbg Sep 12, 2024
Maintainer

I mean, if we clone bar and bar2 we would be able to link each clone to its corresponding call site and thus have a more precise range analysis. If yes, do you think I can implement this in CodeQL ? I was looking at the IR files but I don't know exactly the internals.

I suspect that this would be tricky, but it's difficult for me to say without doing a bunch of work to investigate different approaches to implementing it. In any case, I would expect it to be quite an involved task to find an implementation that doesn't break some other part of the C/C++ analysis or leads to unexpected behaviours.

Even if you do get the "function cloning" to work, you may then run into performance issues outside of toy codebases since you may have potentially many "clones" of a given function, which in turn may lead to further clones of other functions, etc.

The approach based on using SimpleRangeAnalysisExpr to link up bounds from call sites to callees and taking the maximum/minimum across all call-site bounds is probably easier and more performant, though less accurate.

Answer selected by elManto

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Interprocedural Range Analysis #17425

Uh oh!

{{title}}

Uh oh!

elManto
Sep 10, 2024

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

mbg
Sep 10, 2024
Maintainer

Uh oh!

{{title}}

Uh oh!

elManto Sep 11, 2024
Author

Uh oh!

{{title}}

Uh oh!

mbg Sep 12, 2024
Maintainer

Select a reply

Uh oh!

Interprocedural Range Analysis #17425

Uh oh!

elManto Sep 10, 2024

Replies: 1 comment · 2 replies

Uh oh!

mbg Sep 10, 2024 Maintainer

Uh oh!

elManto Sep 11, 2024 Author

Uh oh!

mbg Sep 12, 2024 Maintainer

elManto
Sep 10, 2024

Replies: 1 comment 2 replies

mbg
Sep 10, 2024
Maintainer

elManto Sep 11, 2024
Author

mbg Sep 12, 2024
Maintainer