-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Query stalling on string generation predicate, how to log actual evaluation? #15742
-
I have a query that is generating the string output column for a select expression in a predicate.
string generateOutput(DataFlow::Node param, DataFlow::Node sink) {
result = namespaceType(param) +
" (" + getMethodType(param) + "): " +
getParamMethodName(param, true) +
"(" + param.toString() + ") -> " +
getSinkName(sink) + "(" +
sink.toString() +
") [location: " +
sink.getLocation().toString() + "]"
}
select param, generateOutput(param, sink)
When I run this query on certain (ruby) databases, I get a stall when this predicate is to be evaluated. I'm not familiar with CodeQL internals so something like a random stall during a predicate is hard to debug. I used the --loglevel=ALL flag for database analyze to see if I can maybe fix this myself but all I can gather is that it stalls (obviously) when attempting to evaluate the predicate:
[2024年02月27日 09:58:35] [DEBUG] runner B will work on #9998 evaluator DataFlowDraft::generateOutput/2#5053d725/3@a114c8th.
[2024年02月27日 09:58:35] (3s) Starting to evaluate predicate DataFlowDraft::generateOutput/2#5053d725/3@a114c8th
[2024年02月27日 09:58:35] [DEBUG] Putting #9998 evaluator DataFlowDraft::generateOutput/2#5053d725/3@a114c8th back in the work queue t
o be picked up by an additional worker.
[2024年02月27日 09:58:35] [DEBUG] runner A stepping up.
[2024年02月27日 09:58:35] [DEBUG] runner A will do additional work on #9998 evaluator DataFlowDraft::generateOutput/2#5053d725/3@a114c
8th.
[2024年02月27日 09:58:35] [DEBUG] Putting #9998 evaluator DataFlowDraft::generateOutput/2#5053d725/3@a114c8th back in the work queue t
o be picked up by an additional worker.
[2024年02月27日 09:58:35] [DEBUG] runner D stepping up.
[2024年02月27日 09:58:35] [DEBUG] runner D will do additional work on #9998 evaluator DataFlowDraft::generateOutput/2#5053d725/3@a114c
8th.
[2024年02月27日 09:58:35] [DEBUG] Putting #9998 evaluator DataFlowDraft::generateOutput/2#5053d725/3@a114c8th back in the work queue t
o be picked up by an additional worker.
[2024年02月27日 09:58:35] [DEBUG] runner C stepping up.
[2024年02月27日 09:58:35] [DEBUG] runner C will do additional work on #9998 evaluator DataFlowDraft::generateOutput/2#5053d725/3@a114c
8th.
[2024年02月27日 09:58:35] [DEBUG] Putting #9998 evaluator DataFlowDraft::generateOutput/2#5053d725/3@a114c8th back in the work queue t
o be picked up by an additional worker.
My first question is how can I get even more detailed logs? I mean sure there seems to be a worker/task model which is stalling and passing the task to separate workers when it reaches the generateOutput predicate, but this is something I already know. Id like to introspect as to why it's stalling.
My second question would be how can I fix this immediately? The predicates used in generateOutput do not seem wild to me. Not sure what's going on.
Any help would be greatly appreciated, thanks.
Beta Was this translation helpful? Give feedback.
All reactions
To get more detailed logging you could use the structured evaluator log options:
% codeql database analyze --help -vvv
Usage: codeql database analyze [OPTIONS] -- <database> [<query|dir|suite|pack>...]
Analyze a database, producing meaningful results in the context of the source code.
Run a query suite (or some individual queries) against a CodeQL database, producing results, styled as alerts or paths, in SARIF or another interpreted format.
This command combines the effect of the codeql database run-queries and codeql database interpret-results commands. If you want to run queries whose results don't meet the requirements for being interpreted as
source-code alerts, use codeql databas...Replies: 1 comment 2 replies
-
You seem to select arbitrary combinations of param and sink . There can be very many DataFlow::Node in a program so there are very many combinations. You should probably restrict which param and sink values so you only get the combination you are interested in. In these case it is often helpful to create a very small database and see if the results found by the query are really what you meant. If for a small database with only 100 DataFlow nodes you end up with 100*100 results then you know that things will likely not scale to larger databases.
Beta Was this translation helpful? Give feedback.
All reactions
-
To get more detailed logging you could use the structured evaluator log options:
% codeql database analyze --help -vvv
Usage: codeql database analyze [OPTIONS] -- <database> [<query|dir|suite|pack>...]
Analyze a database, producing meaningful results in the context of the source code.
Run a query suite (or some individual queries) against a CodeQL database, producing results, styled as alerts or paths, in SARIF or another interpreted format.
This command combines the effect of the codeql database run-queries and codeql database interpret-results commands. If you want to run queries whose results don't meet the requirements for being interpreted as
source-code alerts, use codeql database run-queries or codeql query run instead, and then codeql bqrs decode to convert the raw results to a readable notation.
<database> [Mandatory] Path to the CodeQL database to query.
[<query|dir|suite|pack>...]
...
Options for controlling outputting of structured evaluator logs:
--evaluator-log=<file> [Advanced] Output structured logs about evaluator performance to the given file. The format of this log file is subject to change with no notice, but will be a stream of JSON objects
separated by either two newline characters (by default) or one if the --evaluator-log-minify option is passed. Please use codeql generate log-summary <file> to produce a more stable summary
of this file, and avoid parsing the file directly. The file will be overwritten if it already exists.
--evaluator-log-minify [Advanced] If the --evaluator-log option is passed, also passing this option will minimize the size of the JSON log produced, at the expense of making it much less human readable.
--evaluator-log-level=<n>
[Wizards only!] If the --evaluator-log option is passed, also passing this option will configure the verbosity of the JSON log produced. This should be an integer between 1 and 5, with higher
values representing a higher verbosity. Currently the following additional values will be included at each level (in addition to all information already present in preceding levels):
* Level 1: Include basic information about each computation performed by the evaluator.
* Level 2: Include the dependencies of every layer that is computed.
* Level 3: Include details of where each layer is used.
* Level 4: Record an event when a cache lookup misses.
* Level 5 (default): Output the full RA of any relations that are evaluated.
--loglevel=<level> [Wizards only!] Set the logging level of the detailed logs to one of OFF, ERROR, WARN, INFO, DEBUG, TRACE, or ALL.
Beta Was this translation helpful? Give feedback.
All reactions
-
Thank you for that. Though I did restrict the dataflow nodes I just omitted those parts for simplicity.
Beta Was this translation helpful? Give feedback.