-
Notifications
You must be signed in to change notification settings - Fork 1.9k
OOM on Large C Project & Incomplete File Tracing #20181
-
I am using CodeQL to analyze a very large, private C project and have encountered a couple of issues. I would be grateful for any help or guidance you can provide.
Problem 1: Out-of-Memory (OOM) During Database Creation
When I attempt to create a CodeQL database from the project's root directory, the process fails with an Out-of-Memory (OOM) error. This error occurs before the actual compilation phase begins.
Observations:
- The project is extremely large.
- The machine has over 40GB of free RAM at the time of the OOM error.
- As a workaround, I can successfully create a database without an OOM error if I set the
--source-rootto a smaller, specific subdirectory that I am interested in.
Questions:
- What pre-build operations does the CodeQL CLI perform that could consume so much memory, even when significant physical RAM is available?
- What is the recommended best practice or workflow for creating a database for a very large C project to avoid these OOM issues?
Problem 2: Incomplete Tracing for Most Compiled Files
I've noticed that most C files are not being fully analyzed, even though they are part of the compilation.
Observations:
- A specific file, let's call it
example.c, is definitely being compiled. - I have inspected the
build-tracer.logand can confirm that the CodeQL tracer has captured its compilation process. The log contains the completeinvocation,Command, andProcessed command lineforexample.c. - This issue affects the majority of the files in the project. For these files, it seems that none of their internal contents (functions, expressions, variables, etc.) have been extracted.
- When I query the database, the only result related to
example.cis its file-level location (example.c:0:0:0:0). A small number of other files are partially traced, where I can identify some expressions (expr), but the analysis is still far from complete.
For example, running the following query on the database highlights this issue:
from Locatable locb, Location loc where locb.getLocation() = loc select locb, loc
The query shows that for files like example.c, only the file itself is located, with no deeper elements available.
Question:
- Why might the contents of most files not be extracted into the database, even though their build commands were successfully captured by the build tracer?
Beta Was this translation helpful? Give feedback.
All reactions
Replies: 3 comments 2 replies
-
Update:
After a closer look at build-tracer.log, I have a new finding that might be the root cause of Problem 2.
It appears that CodeQL is not tracing the llvm-ar commands that are used to archive .o object files (which are LLVM IR bitcode in my case) into static libraries (.a files).
Here's what I found in the log:
-
The log shows the final linking step where the static library is used, but it also explicitly states that the archive file is being excluded:
Command: /path/to/codeql/cpp/tools/linux64/extractor -mimic /path/to/clang-15 -o example.elf ... -Wl,--whole-archive /path/to/example.a ... excluded /path/to/build/example.a because it is an object -
I have searched the entire
build-tracer.logand cannot find any trace of thellvm-arcommand itself being executed. -
Furthermore, there are no log entries that contain both the archive name (
example.a) and the object file name (example.o) together, which I would expect to see during an archiving step.
This leads me to believe that if the creation of static libraries isn't traced, the extractor might not know which object files are contained within them, and therefore fails to analyze the corresponding source files. Could this be the reason why the contents of example.c and other files are missing from the database?
Is this expected behavior, or is there a specific configuration required to ensure that llvm-ar (or the archiver in general) is traced correctly?
Beta Was this translation helpful? Give feedback.
All reactions
-
Hello, let me ask a couple of questions to better clarify your scenario:
- which command line did you use for the database creation?
- do you have any more detail on the type of the OOM error?
Beta Was this translation helpful? Give feedback.
All reactions
-
To followup on @esteffin questions,
Could you clarify if this is a Java OutOfMemoryError exception raised or the system triggering an OOM situation? If the former, then you can also try to increase the ram available for the evaluator and JVM using the codeql resolve ram option.
The CLI will do a scan of all source code to count lines of code before starting the build. This can be suppressed with --no-calculate-baseline argument to the CodeQL CLI.
If you have purchased Github Advanced Security, I recommend that you also reach out to support.
Beta Was this translation helpful? Give feedback.
All reactions
-
Thanks for the suggestion! Using --no-calculate-baseline did resolve the OOM error.
However, Problem 2 is still a major blocker. I've analyzed the build-trace.log again and found that it's filled with a large number of errors. For example:
[E 12:35:09 867372] Warning[extractor-c++]: In construct_text_message: "/path/to/xxx.c", line 56: error: expected a ";"
and
[E 12:35:09 867373] Warning[extractor-c++]: In construct_text_message: "/path/to/xxx.c", line 56: error: identifier "__u64" is undefined
There are many other variations as well, such as "a declaration here must declare a parameter" and "incomplete type xxx is not allowed".
These error messages seem to correspond directly to the locb results from the query:
from Locatable locb, Location loc where locb.getLocation() = loc select locb, loc
For instance, one of the results is expected a ';', file:///path/to/xxx.c:56:1:56:1.
The log file containing these extractor warnings is about 1.7G in size. This leads me to a question: is it possible that the CodeQL C++ extractor is failing to parse the syntax correctly? This is strange because my project compiles successfully using the specified build command.
Any insights on why the extractor would report so many syntax errors on a codebase that compiles cleanly would be greatly appreciated.
Beta Was this translation helpful? Give feedback.
All reactions
-
Happy to hear that the first part got resolved.
For the 2. part, could you copy the command used to invoke CodeQL? You also mention use of Clang-15 - is that the only compiler used in the project for compiling C/C++ code? To better understand what is going it would also be very helpful if you can create and share a test case that produces the problem.
Beta Was this translation helpful? Give feedback.