-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Large disk cache for Ruby dataset, totaling 30GB #15745
-
I ran an analyze command with:
codeql database analyze "/root/databases/$directory" /root/queries/Query.ql --log-to-stderr --loglevel="ALL" --max-disk-cache=2000 -M 5000 --rerun -j 0 --format=csv -o "/root/queries/output/$directory.csv"
and ended up with a disk cache like this:
root@pwned:~/databases/activerecord-7.0.8/db-ruby/default/strings/0# du . -h
1.6G ./metadata
1.1G ./buckets
28G ./pageDump
30G .
Anyone familiar with CodeQL internals that can tell me what the disk cache is even for? I see "intermediate results" mentioned alot but what does that even mean? How did it end up this large when my limit was 2000MB? What do the directories metadata, buckets and pageDump represent?
Beta Was this translation helpful? Give feedback.
All reactions
Replies: 1 comment
-
string generateOutput(DataFlow::Node param, DataFlow::Node sink) {
result = param.toString() + "string" + sink.toString()
}
from DataFlow::Node param, DataFlow::Node sink
where ParamToSinkFlow::flow(param, sink) and isPublic(param)
select param, generateOutput(param, sink)
Update to original question and also directly related to #15742
A query which is structured like this is stalling on simple datasets and im seeing this in the logs:
[2024年02月27日 21:05:11] (159s) Pausing evaluation to evict 70.57MiB ARRAYS at sequence stamp o+7961
[2024年02月27日 21:05:11] (159s) Unpausing evaluation: 32.48MiB in memory written to disk, 70.77MiB forgotten: 38.28MiB UNREACHABLE (33 items up to o+7956) 32.48MiB VITAL (28 items up to o+4124)
[2024年02月27日 21:05:11] (159s) Pausing evaluation to evict 70.57MiB ARRAYS at sequence stamp o+8030
[2024年02月27日 21:05:11] (159s) Unpausing evaluation: 31.32MiB in memory written to disk, 70.77MiB forgotten: 39.44MiB UNREACHABLE (34 items up to o+8026) 31.32MiB VITAL (27 items up to o+4177)
[2024年02月27日 21:05:11] (159s) Pausing evaluation to evict 70.57MiB ARRAYS at sequence stamp o+8099
[2024年02月27日 21:05:11] (159s) Unpausing evaluation: 31.32MiB in memory written to disk, 70.77MiB forgotten: 39.44MiB UNREACHABLE (34 items up to o+8097) 31.32MiB VITAL (27 items up to o+4235)
[2024年02月27日 21:05:11] (159s) Pausing evaluation to evict 70.57MiB ARRAYS at sequence stamp o+8168
[2024年02月27日 21:05:11] (159s) Unpausing evaluation: 31.32MiB in memory written to disk, 70.77MiB forgotten: 39.44MiB UNREACHABLE (34 items up to o+8166) 31.32MiB VITAL (27 items up to o+4288)
[2024年02月27日 21:05:11] (159s) Pausing evaluation to evict 70.57MiB ARRAYS at sequence stamp o+8238
[2024年02月27日 21:05:11] (159s) Unpausing evaluation: 29.00MiB in memory written to disk, 70.77MiB forgotten: 41.77MiB UNREACHABLE (36 items up to o+8236) 29.00MiB VITAL (25 items up to o+4337)
[2024年02月27日 21:05:14] (162s) Pausing evaluation to evict 70.57MiB ARRAYS at sequence stamp o+8311
[2024年02月27日 21:05:14] (162s) Unpausing evaluation: 42.93MiB in memory written to disk, 70.80MiB forgotten: 27.87MiB UNREACHABLE (29 items up to o+8295) 42.93MiB VITAL (37 items up to o+4414)
This just goes on continuously and takes up disk space for "caching". Im really struggling to understand whats being cached and what the o+n notation means. Its all just really hard to debug. If anyone can help I'd greatly appreciate it. Maybe a CodeQL maintainer who understands this stuff since its all closed source. Also, a very important thing to note is that when I remove sink.toString() from generateOutput and make it predicate of only the param parameter, it evaluates just fine. So Im guessing it has something to do with the double variable evaluation. Though if the variables are being restricted with the where statement, the total times generateOutput has to be evaluated shouldn't be much.
Beta Was this translation helpful? Give feedback.