I've got an older DB (postgres 10.15) that's not yet been upgraded. One problematic table had a few large indexes on it, some of which were corrupt and needed reindexing. Since it's not on version 12+, I can't concurrently reindex the table (which means I need to do it non-concurrently, which requires a table write lock) - so I wanted to know how I could do some rough calculations on how long the reindex would take so I can plan some maintenance. Most of my research ends up in the "just use pg_stat_progress_create_index! (which isn't available in 10), or people just saying to use CONCURRENTLY.
The table is ~200GB, and there are indexes are 7 indexes which are 14GB each (as per pg_relation_size). I can get ~900M/s constant read-rate on the DB for this task. Is there a simple metric I can use to determine how much data will be required to be read to reindex fully?
2 Answers 2
You could just create new index with different name by
create index concurrently index_new on ...
Then drop corrupted index with
drop index concurrently index_old;
Then you could rename new index to old name:
alter index index_new rename to index_old;
Latter will require lock, but for few milliseconds of runtime after acquire the lock. So you do not need downtime due to write lock.
The definition of the index can be obtained from the command pg_dump -s -t tablename --no-acl
This is exactly the same procedure that does reindex concurrently
under the hood. But reindex concurrently
is a bit cheaper since do not need lock for index rename phase.
Also widely known pg_repack
has feature to reindex table with option --only-indexes
. This option is implemented as create + drop index concurrently.
Is there a simple metric I can use to determine how much data will be required to be read to reindex fully?
Well, any index creation without concurrently
will read the entire table sequentially (concurrently
will read the table twice). Something else depends on access method. Btree will sort all live tuples. This is the most time-consumption part of create index, for large indexes the work will be done in temporary files (remember increase maintenance_work_mem
). This part also depends on datatypes and values. Text with small selectivity (e.g. some status
field) will be noticeable slower to build than integer sequences.
I have no way to estimate, except for one: to measure the creation time of an index on some data sample:
create table estimate_table as (
select * from tablename
where created_at > '2020-01-01'
);
\dt+ estimate_table
\timing on
create index on estimate_table ...
Reindex is just a special form of index creation. Hmm, and an important point: reindex table
has no difference with several reindex index
in terms of resourse usage. reindex table
is implemented by calling reindex_index
for each individual index on table. So, table with 5 indexes will be scanned 5 times.
-
Absolutely think this would get around the issue, but I'm still interested in the question I've asked - since: I have some even older legacy machines which are <= version 8.1 (i.e. before CONCURRENTLY was added to CREATE INDEX), and I'd still like a way to get a guestimate on how long a CREATE INDEX is going to take anyway.Noxville– Noxville2020年12月02日 19:00:08 +00:00Commented Dec 2, 2020 at 19:00
The only reliable estimate of how long it will take can come from restoring a physical backup to an identical machine and testing it there.
There are too many factors going into this to come up with a good estimate otherwise.
CREATE
aTABLE
bySELECT
ing 1 in 10 of your records and do a test? We don't know your CPU, RAM and especially your disk config (HDD/SSD - with/without RAID - if with, then which RAID? 0? 1?,5? 0+1? 1+0?). What else will be going on while you reindex? It's impossible to say with the information you've given!Reindex index
will do 1 full table scan, 0 index scan, will write all live tuples into new index.