How can indices in PostgreSQL be so space efficient?

Question 1

I have a PostgreSQL 15 deployment that contains a partitioned table in the order of tens of millions of records.

I've been playing around with index creation and I'm surprised by how little space a btree index is using.

So, the partition table dummy_name_partition_01 has about 13 million records in it. Not sure if relevant, but the records can get a little large, averaging at 2.66 KiB per record (the partition has ~30 GiB without counting indices).

One of the columns (named record_type), which is the column I'm playing around with indices, stores a small (< 50 chars) string. Although it is a TEXT type and not an ENUM, its value is always going to be one of some ~300 possible strings.

I've initially created a BRIN index for that record_type column to save up on disk usage. It seems the index size is about only 1 MiB on disk. Indeed, tiny.

Now, I'm having issues with postgres actually using that BRIN index. It insists in doing sequential scans, so it's like the brin index is useless. I was afraid a btree index would be too large, but then I dropped the BRIN index and created it as BTREE, and its size is of just 92 MiB. I was expecting something in the range of at least 1 GiB!

To measure the index size, I'm querying the information_schema.tables table and using the functions pg_table_size, pg_indexes_size. Namely, I queried the index size with pg_indexes_size when there was no index, then run it after I created the index and just took the difference as being the index size. Of course I did this a few times so I could get the numbers from BRIN vs BTREE.

The index is as simple as a CREATE INDEX foo_bar ON dummy_namy_partition_01 (record_type) for btree, and the same but a USING BRIN for the brin index.

Now, I wonder: does Postgres somehow store a pointer to the data in the record_type column instead of storing duplicate strings all over and then this would be the reason for the index to be in the almost-one-hundred MiBs rathen than a few gigabytes? Or, what is going on here?

Question 2

B-tree index key deduplication was implemented in PostgreSQL 13; it is in effect by default and will collapse multiple index tuples to a single key value and a list of TIDs if all tuples on the page have the same key value. It's not surprising it is effective with a key of low cardinality.

mustaccio mustaccio 28.6k24 gold badges60 silver badges77 bronze badges · Accepted Answer · 2024-11-21 20:59:45Z

B-tree index key deduplication was implemented in PostgreSQL 13; it is in effect by default and will collapse multiple index tuples to a single key value and a list of TIDs if all tuples on the page have the same key value. It's not surprising it is effective with a key of low cardinality.

Stack Exchange Network

How can indices in PostgreSQL be so space efficient?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

How can indices in PostgreSQL be so space efficient?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions