I have a PostgreSQL 15 deployment that contains a partitioned table in the order of tens of millions of records.
I've been playing around with index creation and I'm surprised by how little space a btree index is using.
So, the partition table dummy_name_partition_01
has about 13 million records in it. Not sure if relevant, but the records can get a little large, averaging at 2.66 KiB per record (the partition has ~30 GiB without counting indices).
One of the columns (named record_type
), which is the column I'm playing around with indices, stores a small (< 50 chars) string. Although it is a TEXT type and not an ENUM, its value is always going to be one of some ~300 possible strings.
I've initially created a BRIN index for that record_type
column to save up on disk usage. It seems the index size is about only 1 MiB on disk. Indeed, tiny.
Now, I'm having issues with postgres actually using that BRIN index. It insists in doing sequential scans, so it's like the brin index is useless. I was afraid a btree index would be too large, but then I dropped the BRIN index and created it as BTREE, and its size is of just 92 MiB. I was expecting something in the range of at least 1 GiB!
To measure the index size, I'm querying the information_schema.tables
table and using the functions pg_table_size
, pg_indexes_size
. Namely, I queried the index size with pg_indexes_size
when there was no index, then run it after I created the index and just took the difference as being the index size. Of course I did this a few times so I could get the numbers from BRIN vs BTREE.
The index is as simple as a CREATE INDEX foo_bar ON dummy_namy_partition_01 (record_type)
for btree, and the same but a USING BRIN
for the brin index.
Now, I wonder: does Postgres somehow store a pointer to the data in the record_type
column instead of storing duplicate strings all over and then this would be the reason for the index to be in the almost-one-hundred MiBs rathen than a few gigabytes? Or, what is going on here?
1 Answer 1
B-tree index key deduplication was implemented in PostgreSQL 13; it is in effect by default and will collapse multiple index tuples to a single key value and a list of TIDs if all tuples on the page have the same key value. It's not surprising it is effective with a key of low cardinality.