analyzer is capable of giving index recommendations after going through your
postgres logs. It works with all languages, ORMs and query builders!
There are a couple assumptions about your CI pipeline we make for this to work.
- There are database queries that hit up a real postgres database in your pipeline. The source is not important, it could be e2e, load or integration tests. The queries can be run in a rolled-back transaction and it will still work fine.
- The final schema (after migrations run) is available for analyzer to introspect. And that every table has at least 1 row in it as part of your db seed. We use the database to do extra work by testing your query against different index configurations with your production stats, but all of that work is done in a transaction that’s always rolled back. Data is never modified
- Your
postgres.confis configured with at least the following options.
shared_preload_libraries='auto_explain' auto_explain.log_min_duration=0 auto_explain.log_analyze=true auto_explain.log_verbose=true auto_explain.log_buffers=true auto_explain.log_format='json' logging_collector=on log_directory='/var/log/postgresql' # or any path you like log_filename='postgres.log' # or any name you like
You have a production database you can pull statistics from (using a query given by us)
Currently we only support GitHub actions but it would not be difficult to add support for other CI platforms like azure pipelines.
ubuntu runners in github already ships with postgres as part of the default
image, so we try to leverage that. Because github workflows does not support
specifying arguments to services actions/runner#1152, we
can’t run postgres as a container. And trying to run postgres in docker directly
causes network problems because it seems --network=host is also not supported
in dind (docker-in-docker). So we instead copy an explicit setup to the existing
postgres, which boots up 10x faster than docker anyway.
- Copy the setup script
jobs: run: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Run Postgres run: | sudo tee -a /etc/postgresql/16/main/postgresql.conf <<EOF shared_preload_libraries = 'auto_explain' auto_explain.log_min_duration = 0 auto_explain.log_analyze = true auto_explain.log_verbose = true auto_explain.log_buffers = true auto_explain.log_format = 'json' logging_collector = on log_directory = '/var/log/postgresql' log_filename = 'postgres.log' EOF sudo tee /etc/postgresql/16/main/pg_hba.conf > /dev/null <<EOF host all all 127.0.0.1/32 trust host all all ::1/128 trust local all all peer EOF sudo systemctl start postgresql.service sudo -u postgres createuser -s -d -r -w me sudo -u postgres createdb testing sudo chmod 666 /var/log/postgresql/postgres.log
you can change sudo -u postgres createuser -s -d -r -w me to create a new user
with a name of your choosing and sudo -u postgres createdb testing to create a
db with a different name.
- Run your migrations and seed scripts. This is just an example showing that the migrations should target the postgres instance that was set up with the previous command
jobs: run: runs-on: ubuntu-latest steps: # ... previous steps ... - name: Migrate run: pnpm run migrate && pnpm run seed env: POSTGRES_URL: postgres://me@localhost/testing
- Run your test suite against the same database. You can do this with any tool and use any query builder or ORM you like.
- Run the analyzer.
GITHUB_TOKENis needed to post a comment to your PR reviewing the indexes found in your database.
jobs: run: runs-on: ubuntu-latest steps: # ... previous steps ... - name: Migrate run: pnpm run migrate && pnpm run seed env: POSTGRES_URL: postgres://me@localhost/testing - name: Run integration tests run: pnpm run test:integration env: POSTGRES_URL: postgres://me@localhost/testing - name: Run query-doctor/analyzer uses: query-doctor/analyzer@v0 env: GITHUB_TOKEN: ${{ github.token }} POSTGRES_URL: postgres://me@localhost/testing
- Add
pull-request: writepermissions to your job to allow
jobs: run: permissions: contents: read pull-requests: write runs-on: ubuntu-latest ...
To make sure that we can most accurately emulate your production database, we need access to its stats.
You can use the following function to dump the stats for your database. Copy paste this into psql to create the function.
Since Postgres also keeps track of things like "most common values in table",
part of the statistics table includes copies of cells from all tables.
include_sensitive_info
CREATE OR REPLACE FUNCTION _qd_dump_stats(include_sensitive_info boolean) RETURNS jsonb AS $$ SELECT json_agg(t) FROM ( SELECT c.table_name as "tableName", c.table_schema as "schemaName", cl.reltuples, cl.relpages, cl.relallvisible, n.nspname as "schemaName", json_agg( json_build_object( 'columnName', c.column_name, 'dataType', c.data_type, 'isNullable', (c.is_nullable = 'YES')::boolean, 'characterMaximumLength', c.character_maximum_length, 'numericPrecision', c.numeric_precision, 'numericScale', c.numeric_scale, 'columnDefault', c.column_default, 'stats', ( select json_build_object( 'starelid', s.starelid, 'staattnum', s.staattnum, 'stainherit', s.stainherit, 'stanullfrac', s.stanullfrac, 'stawidth', s.stawidth, 'stadistinct', s.stadistinct, -- slot 1 'stakind1', s.stakind1, 'staop1', s.staop1, 'stacoll1', s.stacoll1, 'stanumbers1', s.stanumbers1, -- slot 2 'stakind2', s.stakind2, 'staop2', s.staop2, 'stacoll2', s.stacoll2, 'stanumbers2', s.stanumbers2, -- slot 3 'stakind3', s.stakind3, 'staop3', s.staop3, 'stacoll3', s.stacoll3, 'stanumbers3', s.stanumbers3, -- slot 4 'stakind4', s.stakind4, 'staop4', s.staop4, 'stacoll4', s.stacoll4, 'stanumbers4', s.stanumbers4, -- slot 5 'stakind5', s.stakind5, 'staop5', s.staop5, 'stacoll5', s.stacoll5, 'stanumbers5', s.stanumbers5, -- non-anonymous stats 'stavalues1', case when $1 then s.stavalues1 else null end, 'stavalues2', case when $1 then s.stavalues2 else null end, 'stavalues3', case when $1 then s.stavalues3 else null end, 'stavalues4', case when $1 then s.stavalues4 else null end, 'stavalues5', case when $1 then s.stavalues5 else null end ) from pg_statistic s where s.starelid = a.attrelid and s.staattnum = a.attnum ) ) ORDER BY c.ordinal_position) as columns FROM information_schema.columns c JOIN pg_attribute a ON a.attrelid = (quote_ident(c.table_schema) || '.' || quote_ident(c.table_name))::regclass AND a.attname = c.column_name JOIN pg_class cl ON cl.relname = c.table_name JOIN pg_namespace n ON n.oid = cl.relnamespace WHERE c.table_name not like 'pg_%' and n.nspname <> 'information_schema' and c.table_name not in ('pg_stat_statements', 'pg_stat_statements_info') GROUP BY c.table_name, c.table_schema, cl.reltuples, cl.relpages, cl.relallvisible, n.nspname /* @qd_introspection */ ) t; $$ LANGUAGE sql STABLE SECURITY DEFINER;
Note: The function is defined with SECURITY DEFINER so that it can be called
either manually, or automatically by the analyzer if you set up stats pull
integration.
psql -d <yourdb> -At -F "" -c "select _qd_dump_stats(false)" > stats.json
This example uses cloudnative-pg, but it can apply to any pod that has access to psql as superuser.
kubectl exec <podname> -n cnpg-system -c postgres -- psql -d <yourdb> -At -F "" -c "select _qd_dump_stats(false)" > stats.json