My pet project has one table which is totally rewritten once a day. This project uses a PostgreSQL database. I decided to use COPY
to insert data from 4 CSV files (about 3 million rows).
At the same time there are maybe some queries to that table from clients. This table has a composite index. I use COPY
in a transaction.
- Should the index be deleted before COPY and then recreated?
- Or index recreated automatically after new data inserted?
- Maybe there is better solution than I have chosen?
Queries need the index.
I don't understand whether the index will be rebuilt automatically or whether it will need to be rebuilt manually. If manually, then maybe delete it first and then create it after copy. But I see that in my case I can't delete the index because there are concurent queries.
1 Answer 1
Crucial bits from your question (bold emphasis mine):
one table which is totally rewritten once a day.
... about 3 million rows
... queries need index
Your questions:
1. Should the index be deleted before COPY
and then recreated?
YES. Definitely.
"At the same time" in your question is used in the sense of "on the other hand", not in the sense of "concurrently with the bulk load" - which would be a nonsensical access pattern to be avoided to begin with.
If you are loading a freshly created table, the fastest method is to create the table, bulk load the table's data using
COPY
, then create any indexes needed for the table. Creating an index on pre-existing data is quicker than updating it incrementally as each row is loaded.
Note the bit about "freshly created table". If wal_level
is minimal
, it pays to create and fill the table in a single transaction to also avoid writing WAL for the big load. The manual again:
COPY
is fastest when used within the same transaction as an earlierCREATE TABLE
orTRUNCATE
command. In such cases no WAL needs to be written, because in case of an error, the files containing the newly loaded data will be removed anyway. However, this consideration only applies whenwal_level
isminimal
as all commands must write WAL otherwise.
You mention "pet project". Chances are, you don't need streaming replication and wal_level
can be minimal
. So if you can afford to drop the table temporarily, this would be fastest and cheapest by a long shot:
BEGIN;
DROP TABLE tbl; -- also removes all indexes to this table
CREATE TABLE tbl;
COPY ...;
CREATE INDEX ...;
COMMIT;
VACUUM ANALYZE tbl;
If you can't drop the table, the next best thing is TRUNCATE
& DROP INDEX
instead of DROP TABLE
& CREATE TABLE
. Otherwise the same as above. Still takes a blocking, exclusive lock, but depending objects like views don't have to be recreated.
2. Or index recreated automatically after new data inserted?
Indexes are ...
... always kept up to date once created.
... never recreated automatically.
It makes sense to do the latter manually sometimes. Like in this case.