Name	Name	Last commit message	Last commit date
Latest commit History 143 Commits
.cursor @ a15bb4b	.cursor @ a15bb4b
ci	ci
docs	docs
test	test
.gitignore	.gitignore
.gitlab-ci.yml	.gitlab-ci.yml
.gitmodules	.gitmodules
CONTRIBUTING.md	CONTRIBUTING.md
COPYRIGHT	COPYRIGHT
LICENSE	LICENSE
README.md	README.md
index_pilot.sh	index_pilot.sh
index_pilot_fdw.sql	index_pilot_fdw.sql
index_pilot_functions.sql	index_pilot_functions.sql
index_pilot_tables.sql	index_pilot_tables.sql
uninstall.sql	uninstall.sql

pg_index_pilot – autonomous index lifecycle management for Postgres

The purpose of pg_index_pilot is to provide all tools needed to manage indexes in Postgres in most automated fashion.

This project is in its very early stage. We start with most boring yet extremely important task: automatic reindexing ("AR") to mitigate index bloat, supporting any types of indexes, and then expand to other areas of index health. And then expand to two other big areas – automated index removal ("AIR") and, finally, automated index creation and optimization ("AIC&O"). It is a part of the Self‐driving Postgres, but can be used independently as a standalone tool.

Docs: Installation | Runbook | FAQ | Function reference | Architecture

What it is for

Automated index lifecycle management for PostgreSQL, starting with automatic reindexing to keep index bloat under control without manual work.

Key principles

Simplicity and full embed: implemented entirely inside PostgreSQL (PL/pgSQL), no external services required to rebuild indexes.
Works everywhere PostgreSQL runs (including managed platforms) — all logic lives in the database.
No superuser requirement for day‐to‐day operations; designed to run under owner/privileged roles in control DB and target DBs.
Scheduling inside the database via pg_cron — no EC2/Lambda or other external orchestrators needed.
Supports reindexing of all common index types (btree, hash, gin, gist, spgist); brin is currently excluded.
Control DB orchestrates multiple target databases via postgres_fdw/dblink; reindexing is executed with reindex concurrently to minimize locking.

See Architecture for detailed design decisions and requirements.

Roadmap
Automated reindexing
Requirements
Recommendations
Installation
- Control database setup (required)
- Self-hosted PostgreSQL Example
Initial launch
Scheduling automated maintenance
Uninstalling pg_index_pilot
Updating pg_index_pilot
Monitoring and Analysis
- View reindexing history
- Check current bloat status

Roadmap

The roadmap covers three big areas:

"AR": Automated Reindexing
1. Maxim Boguk's bloat estimation formula – works with any type of index, not only btree
  1. original implementation (pg_index_pilot) – requires initial full reindex
  2. non-superuser mode for cloud databases (AWS RDS, Google Cloud SQL, Azure)
  3. flexible connection management for dblink
  4. API for stats obtained on a clone (to avoid full reindex on prod primary)
2. Traditional bloat estimation (ioguix; btree only)
3. Exact bloat analysis (pgstattuple; analysis on clones)
4. Tested on managed services
  - RDS and Aurora (see AWS specifics in Installation: docs/installation.md#aws-rds--aurora-specifics)
  - CloudSQL
  - Supabase
  - Crunchy Bridge
  - Azure
5. Integration with postgres_ai monitoring
6. Resource-aware scheduling, predictive maintenance windows (when will load be lowest?)
7. Coordination with other ops (backups, vacuums, upgrades)
8. Parallelization and throttling (adaptive)
9. Predictive bloat modeling
10. Learning & Feedback Loops: learning from past actions, A/B testing and "what-if" simulation (DBLab)
11. Impact estimation before scheduling
12. RCA of fast degraded index health (why it gets bloated fast?) and mitigation (tune autovacuum, avoid xmin horizon getting stuck)
13. Self-adjusting thresholds
"AIR": Automated Index Removal
1. Unused indexes
2. Redundant indexes
3. Invalid indexes (or, per configuration, rebuilding them)
4. Advanced scoring; suboptimal / rarely used indexes cleanup; self-adjusting thresholds
5. Forecasting of index usage; seasonal pattern recognition
6. Impact estimation before removal; "what-if" simulation (DBLab)
"AIC&O": Automated Index Creation & Optimization
1. Index recommendations (including multi-column, expression, partial, hybrid, and covering indexes)
2. Index optimization according to configured goals (latency, size, WAL, write/HOT overhead, read overhead)
3. Experimentation (hypothetical with HypoPG, real with DBLab)
4. Query pattern classification
5. Advanced scoring; cost/benefit analysis
6. Impact estimation before operations; "what-if" simulation (DBLab)

Automated reindexing

The framework of reindexing is implemented entirely inside Postgres, using:

PL/pgSQL functions and stored procedures with transaction control
dblink to execute REINDEX CONCURRENTLY – because it cannot be inside a transaction block)
pg_cron for scheduling

Requirements

PostgreSQL version 13.0 or higher
IMPORTANT: Requires ability to create database (not supported on TigerData, formerly Timescale Cloud)
Separate control database (index_pilot_control) to manage target databases
dblink and postgres_fdw extensions installed in control database
Database owner or user with appropriate permissions
Works with AWS RDS, Google Cloud SQL, Azure Database for PostgreSQL (where database creation is allowed)
Manages multiple target databases from single control database
Uses REINDEX CONCURRENTLY from control database (avoids deadlocks)

Recommendations

If server resources allow set non-zero max_parallel_maintenance_workers (exact amount depends on server parameters).
To set wal_keep_segments to at least 5000, unless the WAL archive is used to support streaming replication.

Installation

Quick install via index_pilot.sh

# Clone the repository
git clone https://gitlab.com/postgres-ai/pg_index_pilot
cd pg_index_pilot
# 1) Install into control database (auto-creates DB, installs extensions/objects)
PGPASSWORD='your_password' \
 ./index_pilot.sh install-control \
 -H your_host -U your_user -C your_control_db_name
# 2) Register a target database via FDW (secure user mapping)
PGPASSWORD='your_password' \
 ./index_pilot.sh register-target \
 -H your_host -U your_user -C your_control_db_name \
 -T your_database --fdw-host your_host
# 3) Verify installation and environment
PGPASSWORD='your_password' \
 ./index_pilot.sh verify \
 -H your_host -U your_user -C your_control_db_name
# (Optional) Uninstall
PGPASSWORD='your_password' \
 ./index_pilot.sh uninstall \
 -H your_host -U your_user -C your_control_db_name --drop-servers

Notes:

Use PGPASSWORD to avoid echoing secrets; the script won’t print passwords.
--fdw-host should be reachable from the database server itself (in Docker/CI it might be postgres, 127.0.0.1, or the container IP).
For self-hosted replace host with 127.0.0.1. For managed services ensure the admin user can create database and create extension.

Before you start (checklist)

PostgreSQL ≥ 13 and ability to create database/extensions (control DB).
Decide: CONTROL_DB name, TARGET_DB name, TARGET_HOST (reachable from Postgres server, not only from client).
If you plan to use pg_cron: ensure it’s in shared_preload_libraries (RDS: parameter group + reboot), and create extension pg_cron in cron.database_name.
The FDW user mapping is looked up for the current_user in the control DB session. Create mapping for that user.

Placeholders used below

CONTROL_DB, TARGET_DB, TARGET_HOST, SERVER_NAME (e.g. target_<target_db>)
CONTROL_USER/PASS (user running commands in control DB)
TARGET_USER/PASS (user in the target DB; typically an owner or a role with owner rights)

Key concepts

target_<db>: FDW server that points to the target database. This name goes to index_pilot.target_databases.fdw_server_name.
A user mapping must exist for current_user (in the control DB) to each target_<db> server you intend to use.

Security Note

CRITICAL: Never use hardcoded passwords in production. The setup_01_user.psql script requires a secure password to be provided via psql variable:

# Generate secure random password
RANDOM_PWD=$(openssl rand -base64 32)
# Use the secure setup script (recommended)
./setup_user_secure.sh
# Or run manually with secure password
psql -f setup_01_user.psql -v index_pilot_password="$RANDOM_PWD"
echo "Generated password: $RANDOM_PWD"

Manual installation

Control database setup (Required)

# Clone the repository
git clone https://gitlab.com/postgres-ai/pg_index_pilot
cd pg_index_pilot
# 1. Create control database (as admin user)
psql -h your-instance.region.rds.amazonaws.com -U postgres -c "create database index_pilot_control;"
# 2. Install required extensions in control database
psql -h your-instance.region.rds.amazonaws.com -U postgres -d index_pilot_control -c "CREATE EXTENSION IF NOT EXISTS postgres_fdw;"
psql -h your-instance.region.rds.amazonaws.com -U postgres -d index_pilot_control -c "CREATE EXTENSION IF NOT EXISTS dblink;"
# 3. Install schema and functions in control database
psql -h your-instance.region.rds.amazonaws.com -U postgres -d index_pilot_control -f index_pilot_tables.sql
psql -h your-instance.region.rds.amazonaws.com -U postgres -d index_pilot_control -f index_pilot_functions.sql
psql -h your-instance.region.rds.amazonaws.com -U postgres -d index_pilot_control -f index_pilot_fdw.sql
# 4. Create FDW server and user mapping for the TARGET database
psql -h your-instance.region.rds.amazonaws.com -U postgres -d index_pilot_control <<'SQL'
create server if not exists target_<your_database> foreign data wrapper postgres_fdw
 options (host 'your-instance.region.rds.amazonaws.com', port '5432', dbname 'your_database');

-- dblink_connect(server_name) uses current_user user mapping; create mapping for the user running control DB (often postgres or index_pilot)
create user mapping if not exists for current_user server target_<your_database>
 options (user 'remote_owner_or_role', password 'remote_password');
SQL
# 5. Register the TARGET database (links index_pilot.target_databases to your FDW server)
psql -h your-instance.region.rds.amazonaws.com -U postgres -d index_pilot_control <<'SQL'
insert into index_pilot.target_databases(database_name, host, port, fdw_server_name, enabled)
values ('your_database', 'your-instance.region.rds.amazonaws.com', 5432, 'target_your_database', true)
on conflict (database_name) do update
 set host=excluded.host, port=excluded.port, fdw_server_name=excluded.fdw_server_name, enabled=true;
SQL
# 7. Verify FDW and environment
psql -h your-instance.region.rds.amazonaws.com -U postgres -d index_pilot_control -c "select * from index_pilot.check_fdw_security_status();"
psql -h your-instance.region.rds.amazonaws.com -U postgres -d index_pilot_control -c "select * from index_pilot.check_environment();"

Self-hosted PostgreSQL Example

# Clone the repository
git clone https://gitlab.com/postgres-ai/pg_index_pilot
cd pg_index_pilot
# 1. Create control database (as superuser)
psql -U postgres -c "create database index_pilot_control;"
# 2. Install required extensions in control database (as superuser)
psql -U postgres -d index_pilot_control -c "CREATE EXTENSION IF NOT EXISTS postgres_fdw;"
psql -U postgres -d index_pilot_control -c "CREATE EXTENSION IF NOT EXISTS dblink;"
# 3. Install schema and functions in control database (as superuser)
psql -U postgres -d index_pilot_control -f index_pilot_tables.sql
psql -U postgres -d index_pilot_control -f index_pilot_functions.sql
psql -U postgres -d index_pilot_control -f index_pilot_fdw.sql
# 4. Create FDW server and user mapping for the TARGET database
psql -U postgres -d index_pilot_control <<'SQL'
create server if not exists target_your_database foreign data wrapper postgres_fdw
 options (host '127.0.0.1', port '5432', dbname 'your_database');

create user mapping if not exists for current_user server target_your_database
 options (user 'remote_owner_or_role', password 'remote_password');
SQL
# 5. Register the TARGET database
psql -U postgres -d index_pilot_control <<'SQL'
insert into index_pilot.target_databases(database_name, host, port, fdw_server_name, enabled)
values ('your_database', '127.0.0.1', 5432, 'target_your_database', true)
on conflict (database_name) do update
 set host=excluded.host, port=excluded.port, fdw_server_name=excluded.fdw_server_name, enabled=true;
SQL
# 7. Verify
psql -U postgres -d index_pilot_control -c "select * from index_pilot.check_fdw_security_status();"
psql -U postgres -d index_pilot_control -c "select * from index_pilot.check_environment();"

Initial launch

⚠️ IMPORTANT: During the first run, all indexes larger than index_size_threshold (default: 10MB) will be analyzed and potentially rebuilt. This process may take hours or days on large databases.

For manual initial run:

# Set credentials
export PGSSLMODE=require
export PGPASSWORD='your_index_pilot_password'
# Run initial analysis and reindexing
nohup psql -h your_host -U index_pilot -d your_database \
 -qXt -c "call index_pilot.periodic(true)" >> index_pilot.log 2>&1

Scheduling automated maintenance

Choosing the right schedule

The optimal maintenance schedule depends on your database characteristics:

Daily maintenance (recommended for):

High-traffic databases with frequent updates
Databases where index bloat accumulates quickly
Systems with sufficient maintenance windows each night
When you want to catch and fix bloat early

Weekly maintenance (recommended for):

Stable databases with predictable workloads
Systems where index bloat accumulates slowly
Production systems where daily maintenance might be disruptive
Databases with limited maintenance windows

Using pg_cron (Recommended)

Step 1: Check where pg_cron is installed

-- Find which database has pg_cron
show cron.database_name;

Step 2: Schedule jobs from the pg_cron database

-- Connect to the database shown in step 1
\c postgres_ai -- or whatever cron.database_name shows
-- Daily maintenance at 2 AM
select cron.schedule_in_database(
 'pg_index_pilot_daily',
 '0 2 * * *',
 'call index_pilot.periodic(real_run := true);',
 'index_pilot_control' -- Run in control database
);
-- Monitoring every 6 hours (no actual reindex)
select cron.schedule_in_database(
 'pg_index_pilot_monitor',
 '0 */6 * * *',
 'call index_pilot.periodic(real_run := false);',
 'index_pilot_control'
);
-- OR weekly maintenance on Sunday at 2 AM
select cron.schedule_in_database(
 'pg_index_pilot_weekly',
 '0 2 * * 0',
 'call index_pilot.periodic(real_run := true);',
 'index_pilot_control'
);

Step 3: Verify and manage schedules

-- View scheduled jobs
select jobname, schedule, command, database, active 
from cron.job 
where jobname like 'pg_index_pilot%';
-- Disable a schedule
select cron.unschedule('pg_index_pilot_daily');
-- Change schedule time
select cron.unschedule('pg_index_pilot_daily');
select cron.schedule_in_database(
 'pg_index_pilot_daily', 
 '0 3 * * *', -- New time: 3 AM
 'call index_pilot.periodic(real_run := true);',
 'index_pilot_control'
);

Using external cron

Create a maintenance script:

# Runs reindexing only on primary (all databases)
psql -d postgres -AtqXc "select not pg_is_in_recovery()" | grep -qx t || exit; psql -d postgres -qt -c "call index_pilot.periodic(true);"

Add to crontab:

# Runs reindexing daily at 2 AM (only on primary)
0 2 * * * /usr/local/bin/index_maintenance.sh

💡 Best Practices:

Schedule during low-traffic periods
Avoid overlapping with backup or other IO-intensive operations
Consider hourly runs for high-write workloads
Monitor resource usage during initial runs (first of all, both disk IO and CPU usage)

Uninstalling pg_index_pilot

To completely remove pg_index_pilot from your database:

# Uninstall the tool (this will delete all collected statistics!)
psql -h your-instance.region.rds.amazonaws.com -U postgres -d your_database -f uninstall.sql
# Check for any leftover invalid indexes from failed reindexes
psql -h your-instance.region.rds.amazonaws.com -U postgres -d your_database \
 -c "select format('drop index concurrently if exists %I.%I;', n.nspname, i.relname) 
 from pg_index idx
 join pg_class i on i.oid = idx.indexrelid
 join pg_namespace n on n.oid = i.relnamespace
 where i.relname ~ '_ccnew[0-9]*$'
 and not idx.indisvalid;"
# Run any drop index commands from the previous query manually

Note: The uninstall script will:

Remove the index_pilot schema and all its objects
Remove the FDW server configuration
List any invalid _ccnew* indexes that need manual cleanup
Preserve the postgres_fdw extension (may be used by other tools)

Updating pg_index_pilot

To update to the latest version:

cd pg_index_pilot
git pull
# Reload the updated functions (or reinstall completely)
psql -1 -d your_database -f index_pilot_functions.sql
psql -1 -d your_database -f index_pilot_fdw.sql

Monitoring and Analysis

View Reindexing History

-- Show recent reindexing operations with status
select 
 schemaname, relname, indexrelname,
 pg_size_pretty(indexsize_before::bigint) as size_before,
 pg_size_pretty(indexsize_after::bigint) as size_after,
 reindex_duration,
 status,
 case when error_message is not null then left(error_message, 50) else null end as error,
 entry_timestamp
from index_pilot.reindex_history 
order by entry_timestamp desc 
limit 20;
-- Show only failed reindexes for debugging
select 
 schemaname, relname, indexrelname,
 pg_size_pretty(indexsize_before::bigint) as size_before,
 reindex_duration,
 error_message,
 entry_timestamp
from index_pilot.reindex_history 
where status = 'failed'
order by entry_timestamp desc;

💡 Tip: Use the convenient index_pilot.history view for formatted output:

-- View recent operations with formatted sizes and status
select * from index_pilot.history limit 20;
-- View only failed operations
select * from index_pilot.history where status = 'failed';

Check Current Bloat Status

-- Check bloat estimates for current database
select 
 indexrelname,
 pg_size_pretty(indexsize::bigint) as current_size,
 round(estimated_bloat::numeric, 1)||'x' as bloat_now
from index_pilot.get_index_bloat_estimates(current_database()) 
order by estimated_bloat desc nulls last 
limit 40;

Baseline, candidates, and exclusions (quick reference)

-- Initialize baseline without reindex (sets best_ratio for large indexes)
select index_pilot.do_force_populate_index_stats('<TARGET_DB>', null, null, null);
-- List what periodic(true) would take under current thresholds
select
 schemaname, relname, indexrelname,
 pg_size_pretty(indexsize) as size,
 round(estimated_bloat::numeric, 2) as bloat_x
from index_pilot.get_index_bloat_estimates('<TARGET_DB>')
where indexsize >= pg_size_bytes(index_pilot.get_setting(datname, schemaname, relname, indexrelname, 'index_size_threshold'))
 and coalesce(index_pilot.get_setting(datname, schemaname, relname, indexrelname, 'skip')::boolean, false) = false
 and (estimated_bloat is null
 or estimated_bloat >= index_pilot.get_setting(datname, schemaname, relname, indexrelname, 'index_rebuild_scale_factor')::float)
order by estimated_bloat desc nulls first
limit 50;
-- Exclude service schemas if desired
select index_pilot.set_or_replace_setting('<TARGET_DB>','pg_toast',null,null,'skip','true',null);
select index_pilot.set_or_replace_setting('<TARGET_DB>','_timescaledb_internal',null,null,'skip','true',null);

Notes:

Baseline sets best_ratio to current size/tuples; immediately after, bloat_x ≈ 1.0 and will grow as indexes bloat.
Small indexes (< minimum_reliable_index_size, default 128kB) skip best_ratio to avoid noise; candidates are still gated by index_size_threshold (default 10MB).

License

postgres-ai/pg_index_pilot

Folders and files

Latest commit

History

Repository files navigation

pg_index_pilot – autonomous index lifecycle management for Postgres

What it is for

Key principles

Table of contents

Roadmap

Automated reindexing

Requirements

Recommendations

Installation

Quick install via index_pilot.sh

Before you start (checklist)

Placeholders used below

Key concepts

Security Note

Manual installation

Control database setup (Required)

Self-hosted PostgreSQL Example

Initial launch

Scheduling automated maintenance

Choosing the right schedule

Using pg_cron (Recommended)

Using external cron

Uninstalling pg_index_pilot

Updating pg_index_pilot

Monitoring and Analysis

View Reindexing History

Check Current Bloat Status

Baseline, candidates, and exclusions (quick reference)

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 9

Uh oh!

Languages

Packages