How to reduce the number of times memory written to disk in the database analyze process. · github/codeql · Discussion #18378

Eathonhsu
Dec 29, 2024

We had a large project (based on cpp development) try to execute the CodeQL.

We're running into slow database analysis times.

We tracked it down - looks like some tests (ReturnStackAllocatedMemory, RedundantNullCheckParam) are constantly writing memory to disk.

Example: (Counts about 16500 time the similar messages).

...
[2024年12月27日 07:47:19] Pausing evaluation to evict 238.89MiB ARRAYS at sequence stamp o+13268168
[2024年12月27日 07:47:19] Unpausing evaluation: 118.33MiB in memory written to disk, 238.99MiB forgotten: 120.66MiB UNREACHABLE (156 items up to o+13268158) 118.33MiB VITAL (153 items up to o+13248308)
[2024年12月27日 07:47:19] Pausing evaluation to evict 238.89MiB ARRAYS at sequence stamp o+13268479
[2024年12月27日 07:47:19] Unpausing evaluation: 119.11MiB in memory written to disk, 238.99MiB forgotten: 119.88MiB UNREACHABLE (155 items up to o+13268475) 119.11MiB VITAL (154 items up to o+13248615)
[2024年12月27日 07:47:19] Pausing evaluation to evict 238.89MiB ARRAYS at sequence stamp o+13268787
[2024年12月27日 07:47:20] Unpausing evaluation: 121.43MiB in memory written to disk, 238.99MiB forgotten: 117.56MiB UNREACHABLE (152 items up to o+13268778) 121.43MiB VITAL (157 items up to o+13248929)
[2024年12月27日 07:47:20] Pausing evaluation to evict 238.89MiB ARRAYS at sequence stamp o+13269098
[2024年12月27日 07:47:20] Unpausing evaluation: 118.33MiB in memory written to disk, 238.99MiB forgotten: 120.66MiB UNREACHABLE (156 items up to o+13269090) 118.33MiB VITAL (153 items up to o+13249236)
[2024年12月27日 07:47:20] Pausing evaluation to evict 238.89MiB ARRAYS at sequence stamp o+13269405
[2024年12月27日 07:47:20] Unpausing evaluation: 119.88MiB in memory written to disk, 238.99MiB forgotten: 119.11MiB UNREACHABLE (154 items up to o+13269397) 119.88MiB VITAL (155 items up to o+13249548)
[2024年12月27日 07:47:20] Pausing evaluation to evict 238.89MiB ARRAYS at sequence stamp o+13269719
[2024年12月27日 07:47:20] Unpausing evaluation: 118.33MiB in memory written to disk, 238.99MiB forgotten: 120.66MiB UNREACHABLE (156 items up to o+13269711) 118.33MiB VITAL (153 items up to o+13249855)
[2024年12月27日 07:47:20] Pausing evaluation to evict 238.89MiB ARRAYS at sequence stamp o+13270031
[2024年12月27日 07:47:21] Unpausing evaluation: 118.33MiB in memory written to disk, 238.99MiB forgotten: 120.66MiB UNREACHABLE (156 items up to o+13270022) 118.33MiB VITAL (153 items up to o+13250157)
[2024年12月27日 07:47:21] Pausing evaluation to evict 238.89MiB ARRAYS at sequence stamp o+13270333
[2024年12月27日 07:47:21] Unpausing evaluation: 122.20MiB in memory written to disk, 239.09MiB forgotten: 116.88MiB UNREACHABLE (152 items up to o+13270327) 122.20MiB VITAL (158 items up to o+13250479)
...

NOTE:
Another project execution time is much faster and the similar messages counts about 200 times.

Can we optimize the parameters to improve the balance between memory utilization, cache efficiency, and disk I/O operations?
or have any suggestions?

Thank you for your valuable suggestions and guidance. 😊

Ethan

Replies: 4 comments 1 reply

smowton
Dec 30, 2024
Maintainer

Could you provide a log from the offending run so we can see what is being evaluated when we start to need to write temporary data to disk like this?

Meanwhile, could you try setting --ram to the number of megabytes you'd like CodeQL to top out at, to see if that helps?

0 replies

Eathonhsu
Dec 31, 2024
Author

I upload the CodeQL database analysis logs for reference.
codeql-analysis-log.zip

The process log: (ReturnStackAllocatedMemory and RedundantNullCheckParam, these two processes are exhibiting notably slow execution times)
[1/24 eval 15.2s] Evaluation done; writing results to codeql\cpp-queries\Best Practices\Unused Entities\UnusedStaticVariables.bqrs.
[2/24 eval 15.2s] Evaluation done; writing results to codeql\cpp-queries\Best Practices\Likely Errors\OffsetUseBeforeRangeCheck.bqrs.
[3/24 eval 15.2s] Evaluation done; writing results to codeql\cpp-queries\Likely Bugs\Arithmetic\ComparisonPrecedence.bqrs.
[4/24 eval 15.2s] Evaluation done; writing results to codeql\cpp-queries\Likely Bugs\ContinueInFalseLoop.bqrs.
[5/24 eval 15.3s] Evaluation done; writing results to codeql\cpp-queries\Likely Bugs\Arithmetic\UnsignedGEZero.bqrs.
[6/24 eval 15.3s] Evaluation done; writing results to codeql\cpp-queries\Best Practices\Unused Entities\UnusedLocals.bqrs.
[7/24 eval 15.5s] Evaluation done; writing results to codeql\cpp-queries\jsf4円.07 Header Files\AV Rule 35.bqrs.
[8/24 eval 16.1s] Evaluation done; writing results to codeql\cpp-queries\Likely Bugs\Conversion\ImplicitDowncastFromBitfield.bqrs.
[9/24 eval 1m23s] Evaluation done; writing results to codeql\cpp-queries\jsf4円.13 Functions\AV Rule 114.bqrs.
[10/24 eval 10m38s] Evaluation done; writing results to codeql\cpp-queries\Security\CWE\CWE-190\ComparisonWithWiderType.bqrs.
[11/24 eval 10m47s] Evaluation done; writing results to codeql\cpp-queries\Security\CWE\CWE-835\InfiniteLoopWithUnsatisfiableExitCondition.bqrs.
[12/24 eval 70m41s] Evaluation done; writing results to codeql\cpp-queries\Likely Bugs\Memory Management\ReturnStackAllocatedMemory.bqrs.
[13/24 eval 70m53s] Evaluation done; writing results to codeql\cpp-queries\Likely Bugs\Memory Management\PointerOverflow.bqrs.
[14/24 eval 70m53s] Evaluation done; writing results to codeql\cpp-queries\experimental\Best Practices\UselessTest.bqrs.
[15/24 eval 75m9s] Evaluation done; writing results to codeql\cpp-queries\Likely Bugs\RedundantNullCheckSimple.bqrs.
[16/24 eval 82m26s] Evaluation done; writing results to codeql\cpp-queries\Critical\OverflowStatic.bqrs.
[17/24 eval 82m26s] Evaluation done; writing results to codeql\cpp-queries\Likely Bugs\Conversion\LossyFunctionResultCast.bqrs.
[18/24 eval 82m26s] Evaluation done; writing results to codeql\cpp-queries\Critical\OverflowDestination.bqrs.
[19/24 eval 82m26s] Evaluation done; writing results to codeql\cpp-queries\Security\CWE\CWE-131\NoSpaceForZeroTerminator.bqrs.
[20/24 eval 82m26s] Evaluation done; writing results to codeql\cpp-queries\Critical\OverflowCalculated.bqrs.
[21/24 eval 82m26s] Evaluation done; writing results to codeql\cpp-queries\Likely Bugs\Likely Typos\UsingStrcpyAsBoolean.bqrs.
[22/24 eval 82m26s] Evaluation done; writing results to codeql\cpp-queries\Security\CWE\CWE-119\OverflowBuffer.bqrs.
[23/24 eval 82m26s] Evaluation done; writing results to codeql\cpp-queries\Security\CWE\CWE-120\VeryLikelyOverrunWrite.bqrs.
[24/24 eval 118m42s] Evaluation done; writing results to codeql\cpp-queries\experimental\Likely Bugs\RedundantNullCheckParam.bqrs.

execute-queries-20241227.064755.300.log
...
[2024年12月27日 06:58:58] [PROGRESS] execute queries> [11/24 eval 10m47s] Evaluation done; writing results to codeql\cpp-queries\Security\CWE\CWE-835\InfiniteLoopWithUnsatisfiableExitCondition.bqrs.
[2024年12月27日 06:58:58] Pausing evaluation to evict 238.87MiB ARRAYS at sequence stamp o+10871931
[2024年12月27日 06:58:58] Unpausing evaluation: 69.62MiB in memory written to disk, 239.05MiB forgotten: 92.81MiB UNREACHABLE (121 items up to o+10871927) 2.37MiB UNIMPORTANT (12 items up to o+10871577) 143.87MiB VITAL (181 items up to o+10852707)
[2024年12月27日 06:58:58] Pausing evaluation to evict 238.87MiB ARRAYS at sequence stamp o+10872287
[2024年12月27日 06:58:58] Unpausing evaluation: 141.54MiB in memory written to disk, 239.00MiB forgotten: 72.70MiB UNREACHABLE (94 items up to o+10872284) 166.29MiB VITAL (216 items up to o+10853137)
[2024年12月27日 06:58:59] Pausing evaluation to evict 238.87MiB ARRAYS at sequence stamp o+10872651
[2024年12月27日 06:58:59] Unpausing evaluation: 125.30MiB in memory written to disk, 238.99MiB forgotten: 81.98MiB UNREACHABLE (106 items up to o+10872644) 157.01MiB VITAL (203 items up to o+10853564)
...
[2024年12月27日 07:58:52] [PROGRESS] execute queries> [12/24 eval 70m41s] Evaluation done; writing results to codeql\cpp-queries\Likely Bugs\Memory Management\ReturnStackAllocatedMemory.bqrs.
...

About the parameters --ram I will try this, thanks.

0 replies

smowton
Dec 31, 2024
Maintainer

Thanks, that's useful -- it shows that either your database has a very large codebase extracted (e.g., there are evidently a lot of different control-flow nodes that guard others, and a lot of variable-accesses, looking at two of the predicates that cause significant stress and return hundred-million-scale results). Are you able to share any details about the code you're analysing?

One thing that could be useful is subdividing your analysis into different projects. For example, often if a repository contains millions of lines of code, actually it can be subdivided into subsets which are interesting to security-analyse together, such as programs and their dependent libraries, vs. those which don't interact in this sense, e.g. a pair of unrelated programs neither of which calls the other. If this sounds like your use case, one route to optimising your analysis could be to prepare more fine-grained databases each of which is analysed individually.

0 replies

Eathonhsu
Jan 3, 2025
Author

Many thank for your suggestion.

Your assessment is correct - this project integrates multiple discrete functionalities (packed as a library), though not all components share direct dependencies.

BTW, according to my local run Codeql database analysis results. Increase --ram has indeed been an improvement, but it has only improve about 4% (ram from 8G increase to 16G)

We want to try skip code analysis at specific locations to isolate the issue.
The reason is we have another project that sharing about 60% of the code, but it doesn't require such a long processing time.

Have you conducted research regarding methodologies for isolating specific code path analysis?

Assuming we have 5 specific file path in this project. if I wanna isolate the A,B path files in the analysis and find out which path may cause this problem.
A/.c
B/.c
C/.c
D/.c
E/*.c

1 reply

@smowton

smowton Jan 6, 2025
Maintainer

Usually the best way to achieve this is by using a manual build: https://docs.github.com/en/code-security/code-scanning/creating-an-advanced-setup-for-code-scanning/codeql-code-scanning-for-compiled-languages#about-specifying-build-steps-manually

Basically you supply a build command that only builds some subset of your code. The CodeQL analysis will trace the build process and only analyse that same code. For example you might build a particular executable and the libraries it depends upon, which could be as simple as make prog1 depending on how your build system is set up.

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to reduce the number of times memory written to disk in the database analyze process. #18378

Uh oh!

{{title}}

Uh oh!

Eathonhsu
Dec 29, 2024

Replies: 4 comments 1 reply

Uh oh!

{{title}}

Uh oh!

smowton
Dec 30, 2024
Maintainer

Uh oh!

{{title}}

Uh oh!

Eathonhsu
Dec 31, 2024
Author

Uh oh!

{{title}}

Uh oh!

smowton
Dec 31, 2024
Maintainer

Uh oh!

{{title}}

Uh oh!

Eathonhsu
Jan 3, 2025
Author

Uh oh!

{{title}}

Uh oh!

smowton Jan 6, 2025
Maintainer

Select a reply

Uh oh!

How to reduce the number of times memory written to disk in the database analyze process. #18378

Uh oh!

Eathonhsu Dec 29, 2024

Replies: 4 comments · 1 reply

Uh oh!

smowton Dec 30, 2024 Maintainer

Uh oh!

Eathonhsu Dec 31, 2024 Author

Uh oh!

smowton Dec 31, 2024 Maintainer

Uh oh!

Eathonhsu Jan 3, 2025 Author

Uh oh!

smowton Jan 6, 2025 Maintainer

Eathonhsu
Dec 29, 2024

Replies: 4 comments 1 reply

smowton
Dec 30, 2024
Maintainer

Eathonhsu
Dec 31, 2024
Author

smowton
Dec 31, 2024
Maintainer

Eathonhsu
Jan 3, 2025
Author

smowton Jan 6, 2025
Maintainer