Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Large disk cache for Ruby dataset, totaling 30GB #15745

Unanswered
Discussion options

I ran an analyze command with:

codeql database analyze "/root/databases/$directory" /root/queries/Query.ql --log-to-stderr --loglevel="ALL" --max-disk-cache=2000 -M 5000 --rerun -j 0 --format=csv -o "/root/queries/output/$directory.csv"

and ended up with a disk cache like this:

root@pwned:~/databases/activerecord-7.0.8/db-ruby/default/strings/0# du . -h
1.6G ./metadata
1.1G ./buckets
28G ./pageDump
30G .

Anyone familiar with CodeQL internals that can tell me what the disk cache is even for? I see "intermediate results" mentioned alot but what does that even mean? How did it end up this large when my limit was 2000MB? What do the directories metadata, buckets and pageDump represent?

You must be logged in to vote

Replies: 1 comment

Comment options

string generateOutput(DataFlow::Node param, DataFlow::Node sink) {
 result = param.toString() + "string" + sink.toString()
}
from DataFlow::Node param, DataFlow::Node sink
where ParamToSinkFlow::flow(param, sink) and isPublic(param)
select param, generateOutput(param, sink)

Update to original question and also directly related to #15742

A query which is structured like this is stalling on simple datasets and im seeing this in the logs:

[2024年02月27日 21:05:11] (159s) Pausing evaluation to evict 70.57MiB ARRAYS at sequence stamp o+7961
[2024年02月27日 21:05:11] (159s) Unpausing evaluation: 32.48MiB in memory written to disk, 70.77MiB forgotten: 38.28MiB UNREACHABLE (33 items up to o+7956) 32.48MiB VITAL (28 items up to o+4124)
[2024年02月27日 21:05:11] (159s) Pausing evaluation to evict 70.57MiB ARRAYS at sequence stamp o+8030
[2024年02月27日 21:05:11] (159s) Unpausing evaluation: 31.32MiB in memory written to disk, 70.77MiB forgotten: 39.44MiB UNREACHABLE (34 items up to o+8026) 31.32MiB VITAL (27 items up to o+4177)
[2024年02月27日 21:05:11] (159s) Pausing evaluation to evict 70.57MiB ARRAYS at sequence stamp o+8099
[2024年02月27日 21:05:11] (159s) Unpausing evaluation: 31.32MiB in memory written to disk, 70.77MiB forgotten: 39.44MiB UNREACHABLE (34 items up to o+8097) 31.32MiB VITAL (27 items up to o+4235)
[2024年02月27日 21:05:11] (159s) Pausing evaluation to evict 70.57MiB ARRAYS at sequence stamp o+8168
[2024年02月27日 21:05:11] (159s) Unpausing evaluation: 31.32MiB in memory written to disk, 70.77MiB forgotten: 39.44MiB UNREACHABLE (34 items up to o+8166) 31.32MiB VITAL (27 items up to o+4288)
[2024年02月27日 21:05:11] (159s) Pausing evaluation to evict 70.57MiB ARRAYS at sequence stamp o+8238
[2024年02月27日 21:05:11] (159s) Unpausing evaluation: 29.00MiB in memory written to disk, 70.77MiB forgotten: 41.77MiB UNREACHABLE (36 items up to o+8236) 29.00MiB VITAL (25 items up to o+4337)
[2024年02月27日 21:05:14] (162s) Pausing evaluation to evict 70.57MiB ARRAYS at sequence stamp o+8311
[2024年02月27日 21:05:14] (162s) Unpausing evaluation: 42.93MiB in memory written to disk, 70.80MiB forgotten: 27.87MiB UNREACHABLE (29 items up to o+8295) 42.93MiB VITAL (37 items up to o+4414)

This just goes on continuously and takes up disk space for "caching". Im really struggling to understand whats being cached and what the o+n notation means. Its all just really hard to debug. If anyone can help I'd greatly appreciate it. Maybe a CodeQL maintainer who understands this stuff since its all closed source. Also, a very important thing to note is that when I remove sink.toString() from generateOutput and make it predicate of only the param parameter, it evaluates just fine. So Im guessing it has something to do with the double variable evaluation. Though if the variables are being restricted with the where statement, the total times generateOutput has to be evaluated shouldn't be much.

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet

AltStyle によって変換されたページ (->オリジナル) /