1

I have a PostgreSQL 15 deployment that contains a partitioned table in the order of tens of millions of records.

I've been playing around with index creation and I'm surprised by how little space a btree index is using.

So, the partition table dummy_name_partition_01 has about 13 million records in it. Not sure if relevant, but the records can get a little large, averaging at 2.66 KiB per record (the partition has ~30 GiB without counting indices).

One of the columns (named record_type), which is the column I'm playing around with indices, stores a small (< 50 chars) string. Although it is a TEXT type and not an ENUM, its value is always going to be one of some ~300 possible strings.

I've initially created a BRIN index for that record_type column to save up on disk usage. It seems the index size is about only 1 MiB on disk. Indeed, tiny.

Now, I'm having issues with postgres actually using that BRIN index. It insists in doing sequential scans, so it's like the brin index is useless. I was afraid a btree index would be too large, but then I dropped the BRIN index and created it as BTREE, and its size is of just 92 MiB. I was expecting something in the range of at least 1 GiB!

To measure the index size, I'm querying the information_schema.tables table and using the functions pg_table_size, pg_indexes_size. Namely, I queried the index size with pg_indexes_size when there was no index, then run it after I created the index and just took the difference as being the index size. Of course I did this a few times so I could get the numbers from BRIN vs BTREE.

The index is as simple as a CREATE INDEX foo_bar ON dummy_namy_partition_01 (record_type) for btree, and the same but a USING BRIN for the brin index.

Now, I wonder: does Postgres somehow store a pointer to the data in the record_type column instead of storing duplicate strings all over and then this would be the reason for the index to be in the almost-one-hundred MiBs rathen than a few gigabytes? Or, what is going on here?

asked Nov 21, 2024 at 19:43

1 Answer 1

1

B-tree index key deduplication was implemented in PostgreSQL 13; it is in effect by default and will collapse multiple index tuples to a single key value and a list of TIDs if all tuples on the page have the same key value. It's not surprising it is effective with a key of low cardinality.

answered Nov 21, 2024 at 20:59

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.