8

I am transitioning from SQL Server to Postgres, and one of the biggest things for me to digest is the non-existence of the "clustered key" that sorts the data in Postgres.

Can someone share their thoughts on how Postgres avoided the need for an internally sorted dataset and how it works with large heap tables and still supply exceptional performance?

MDCCL
8,5303 gold badges32 silver badges63 bronze badges
asked Apr 2, 2019 at 20:24
3
  • SQL Server tables can be used as heaps but the benefit of having the clusters index that automatically sorts data is far greater to not have one. Even though postgres does have clusters index it doesn't maintain the order of data. And this made wonder how it performs under a significant workload. Commented Apr 2, 2019 at 20:57
  • Did you benchmark your database to see if not having a clustered index really slows down things? Commented Apr 3, 2019 at 6:00
  • 2
    I have mostly worked with multitenant data sets with SQL server and having tenants data in a sorted manner has allowed the read-ahead-read to be more effective and load more relevenet/validata data to the buffer cache. Commented Apr 3, 2019 at 17:33

3 Answers 3

0

You can try pg_repack extension to cluster online with less locking

answered Apr 3, 2019 at 19:45
0
6

PostgreSQL simply doesn't implement this feature. There is no trick to not implementing it. It isn't implemented in the straight forward, uncomplicated way of just not doing it. To use one bit of jargon, all btree indexes in PostgreSQL are "secondary indexes", not "primary indexes". Even the primary key's index is a "secondary index".

There are some cases where clustered keys (or index organized tables, as another product calls them) are important, and in those cases PostgreSQL fails to "supply exceptional performance". You can argue about how common those cases are, of course, but they certainly do exist and it is unfortunate that PostgreSQL doesn't offer a solution for them. There have proposals to address this, but I don't think any of those efforts are currently active.

In some cases, you can ameliorate the problem by using the CLUSTER command, or by implementing partitioning, or by using covering indexes, but none of these is entirely satisfactory as an alternative to real clustering.

answered Apr 3, 2019 at 15:56
1
  • Hey @jjanes, thanks for your feedback. Using CLUSTERED is definitely a possibility. But that will require an explicit lock on the table which is a no-no mostly. And Yes I have taken declarative partitioning to keep the data sets more manageable. So from what you have mentioned getting the correct indexes is probably more critical. Commented Apr 3, 2019 at 17:59
3

PostgreSQL doesn't do anything special to replace the "need" of a clustered index.

It just simply doesn't have that feature. (Some would say that isn't a great loss.)

You can manually perform a one-time cluster with CLUSTER or pg_repack.

There is also declarative partitioning (though it has a number of caveats until PostgreSQL 11). It isn't quite clustering, but can be used to group rows into specified buckets.

answered Sep 3, 2019 at 22:57

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.