Query stalling on string generation predicate, how to log actual evaluation? · github/codeql · Discussion #15742

0x73746F70626F74686572696E676D65
Feb 27, 2024

I have a query that is generating the string output column for a select expression in a predicate.

string generateOutput(DataFlow::Node param, DataFlow::Node sink) {
 result = namespaceType(param) +
 " (" + getMethodType(param) + "): " +
 getParamMethodName(param, true) +
 "(" + param.toString() + ") -> " +
 getSinkName(sink) + "(" +
 sink.toString() +
 ") [location: " +
 sink.getLocation().toString() + "]"
}
select param, generateOutput(param, sink)

When I run this query on certain (ruby) databases, I get a stall when this predicate is to be evaluated. I'm not familiar with CodeQL internals so something like a random stall during a predicate is hard to debug. I used the --loglevel=ALL flag for database analyze to see if I can maybe fix this myself but all I can gather is that it stalls (obviously) when attempting to evaluate the predicate:

[2024年02月27日 09:58:35] [DEBUG] runner B will work on #9998 evaluator DataFlowDraft::generateOutput/2#5053d725/3@a114c8th.
[2024年02月27日 09:58:35] (3s) Starting to evaluate predicate DataFlowDraft::generateOutput/2#5053d725/3@a114c8th
[2024年02月27日 09:58:35] [DEBUG] Putting #9998 evaluator DataFlowDraft::generateOutput/2#5053d725/3@a114c8th back in the work queue t
o be picked up by an additional worker.
[2024年02月27日 09:58:35] [DEBUG] runner A stepping up.
[2024年02月27日 09:58:35] [DEBUG] runner A will do additional work on #9998 evaluator DataFlowDraft::generateOutput/2#5053d725/3@a114c
8th.
[2024年02月27日 09:58:35] [DEBUG] Putting #9998 evaluator DataFlowDraft::generateOutput/2#5053d725/3@a114c8th back in the work queue t
o be picked up by an additional worker.
[2024年02月27日 09:58:35] [DEBUG] runner D stepping up.
[2024年02月27日 09:58:35] [DEBUG] runner D will do additional work on #9998 evaluator DataFlowDraft::generateOutput/2#5053d725/3@a114c
8th.
[2024年02月27日 09:58:35] [DEBUG] Putting #9998 evaluator DataFlowDraft::generateOutput/2#5053d725/3@a114c8th back in the work queue t
o be picked up by an additional worker.
[2024年02月27日 09:58:35] [DEBUG] runner C stepping up.
[2024年02月27日 09:58:35] [DEBUG] runner C will do additional work on #9998 evaluator DataFlowDraft::generateOutput/2#5053d725/3@a114c
8th.
[2024年02月27日 09:58:35] [DEBUG] Putting #9998 evaluator DataFlowDraft::generateOutput/2#5053d725/3@a114c8th back in the work queue t
o be picked up by an additional worker.

My first question is how can I get even more detailed logs? I mean sure there seems to be a worker/task model which is stalling and passing the task to separate workers when it reaches the generateOutput predicate, but this is something I already know. Id like to introspect as to why it's stalling.

My second question would be how can I fix this immediately? The predicates used in generateOutput do not seem wild to me. Not sure what's going on.

Any help would be greatly appreciated, thanks.

Answered by aibaars

Feb 27, 2024

To get more detailed logging you could use the structured evaluator log options:

 % codeql database analyze --help -vvv
Usage: codeql database analyze [OPTIONS] -- <database> [<query|dir|suite|pack>...]
Analyze a database, producing meaningful results in the context of the source code.
Run a query suite (or some individual queries) against a CodeQL database, producing results, styled as alerts or paths, in SARIF or another interpreted format.
This command combines the effect of the codeql database run-queries and codeql database interpret-results commands. If you want to run queries whose results don't meet the requirements for being interpreted as
source-code alerts, use codeql databas...

View full answer

Replies: 1 comment 2 replies

aibaars
Feb 27, 2024

You seem to select arbitrary combinations of param and sink . There can be very many DataFlow::Node in a program so there are very many combinations. You should probably restrict which param and sink values so you only get the combination you are interested in. In these case it is often helpful to create a very small database and see if the results found by the query are really what you meant. If for a small database with only 100 DataFlow nodes you end up with 100*100 results then you know that things will likely not scale to larger databases.

2 replies

@aibaars

aibaars Feb 27, 2024

To get more detailed logging you could use the structured evaluator log options:

 % codeql database analyze --help -vvv
Usage: codeql database analyze [OPTIONS] -- <database> [<query|dir|suite|pack>...]
Analyze a database, producing meaningful results in the context of the source code.
Run a query suite (or some individual queries) against a CodeQL database, producing results, styled as alerts or paths, in SARIF or another interpreted format.
This command combines the effect of the codeql database run-queries and codeql database interpret-results commands. If you want to run queries whose results don't meet the requirements for being interpreted as
source-code alerts, use codeql database run-queries or codeql query run instead, and then codeql bqrs decode to convert the raw results to a readable notation.
 <database> [Mandatory] Path to the CodeQL database to query.
 [<query|dir|suite|pack>...]
 ...
Options for controlling outputting of structured evaluator logs:
 --evaluator-log=<file> [Advanced] Output structured logs about evaluator performance to the given file. The format of this log file is subject to change with no notice, but will be a stream of JSON objects
 separated by either two newline characters (by default) or one if the --evaluator-log-minify option is passed. Please use codeql generate log-summary <file> to produce a more stable summary
 of this file, and avoid parsing the file directly. The file will be overwritten if it already exists.
 --evaluator-log-minify [Advanced] If the --evaluator-log option is passed, also passing this option will minimize the size of the JSON log produced, at the expense of making it much less human readable.
 --evaluator-log-level=<n>
 [Wizards only!] If the --evaluator-log option is passed, also passing this option will configure the verbosity of the JSON log produced. This should be an integer between 1 and 5, with higher
 values representing a higher verbosity. Currently the following additional values will be included at each level (in addition to all information already present in preceding levels):
 * Level 1: Include basic information about each computation performed by the evaluator.
 * Level 2: Include the dependencies of every layer that is computed.
 * Level 3: Include details of where each layer is used.
 * Level 4: Record an event when a cache lookup misses.
 * Level 5 (default): Output the full RA of any relations that are evaluated.
 --loglevel=<level> [Wizards only!] Set the logging level of the detailed logs to one of OFF, ERROR, WARN, INFO, DEBUG, TRACE, or ALL.

Answer selected by 0x73746F70626F74686572696E676D65

@0x73746F70626F74686572696E676D65

0x73746F70626F74686572696E676D65 Feb 27, 2024
Author

Thank you for that. Though I did restrict the dataflow nodes I just omitted those parts for simplicity.

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Query stalling on string generation predicate, how to log actual evaluation? #15742

Uh oh!

{{title}}

Uh oh!

0x73746F70626F74686572696E676D65
Feb 27, 2024

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

aibaars
Feb 27, 2024

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

aibaars Feb 27, 2024

Uh oh!

{{title}}

Uh oh!

0x73746F70626F74686572696E676D65 Feb 27, 2024
Author

Select a reply

Uh oh!

Query stalling on string generation predicate, how to log actual evaluation? #15742

Uh oh!

0x73746F70626F74686572696E676D65 Feb 27, 2024

Replies: 1 comment · 2 replies

Uh oh!

aibaars Feb 27, 2024

Uh oh!

Uh oh!

aibaars Feb 27, 2024

Uh oh!

0x73746F70626F74686572696E676D65 Feb 27, 2024 Author

0x73746F70626F74686572696E676D65
Feb 27, 2024

Replies: 1 comment 2 replies

aibaars
Feb 27, 2024

0x73746F70626F74686572696E676D65 Feb 27, 2024
Author