Skip to main content

3 tips to manage large Postgres databases

Try these handy solutions to common problems when dealing with huge databases.
2 readers like this.
Green graph of measurements
Image by:

Internet Archive Book Images. Modified by Opensource.com. CC BY-SA 4.0

The relational database PostgreSQL (also known as Postgres) has grown increasingly popular, and enterprises and public sectors use it across the globe. With this widespread adoption, databases have become larger than ever. At Crunchy Data, we regularly work with databases north of 20TB, and our existing databases continue to grow. My colleague David Christensen and I have gathered some tips about managing a database with huge tables.

Big tables

Production databases commonly consist of many tables with varying data, sizes, and schemas. It's common to end up with a single huge and unruly database table, far larger than any other table in your database. This table often stores activity logs or time-stamped events and is necessary for your application or users.

Really large tables can cause challenges for many reasons, but a common one is locks. Regular maintenance on a table often requires locks, but locks on your large table can take down your application or cause a traffic jam and many headaches. I have a few tips for doing basic maintenance, like adding columns or indexes, while avoiding long-running locks.

Adding indexes problem: Index creation locks the table for the duration of the creation process. If you have a massive table, this can take hours.

CREATE INDEX ON customers (last_name)

Solution: Use the CREATE INDEX CONCURRENTLY feature. This approach splits up index creation into two parts, one with a brief lock to create the index that starts tracking changes immediately but minimizes application blockage, followed by a full build-out of the index, after which queries can start using it.

CREATE INDEX CONCURRENTLY ON customers (last_name)

Adding columns

Adding a column is a common request during the life of a database, but with a huge table, it can be tricky, again, due to locking.

Problem: When you add a new column with a default that calls a function, Postgres needs to rewrite the table. For big tables, this can take several hours.

Solution: Split up the operation into multiple steps with the total effect of the basic statement, but retain control of the timing of locks.

Add the column:

ALTER TABLE all_my_exes ADD COLUMN location text

Add the default:

ALTER TABLE all_my_exes ALTER COLUMN location
SET DEFAULT texas()

Use UPDATE to add the default:

UPDATE all_my_exes SET location = DEFAULT

Adding constraints

Problem: You want to add a check constraint for data validation. But if you use the straightforward approach to adding a constraint, it will lock the table while it validates all of the existing data in the table. Also, if there's an error at any point in the validation, it will roll back.

ALTER TABLE favorite_bands 
ADD CONSTRAINT name_check
CHECK (name = 'Led Zeppelin')

Solution: Tell Postgres about the constraint but don't validate it. Validate in a second step. This will take a short lock in the first step, ensuring that all new/modified rows will fit the constraint, then validate in a separate pass to confirm all existing data passes the constraint.

Tell Postgres about the constraint but do not to enforce it:

ALTER TABLE favorite_bands
ADD CONSTRAINT name_check
CHECK (name = 'Led Zeppelin') NOT VALID

Then VALIDATE it after it's created:

ALTER TABLE favorite_bands VALIDATE CONSTRAINT name_check

​​Hungry for more?

David Christensen and I will be in Pasadena, CA, at SCaLE's Postgres Days, March 9-10. Lots of great folks from the Postgres community will be there too. Join us!

What to read next
Elizabeth Garrett Christensen headshot
Elizabeth writes about Postgres at Crunchy Data, organizes the Kansas City Postgres User Group, and volunteers for the US PostgreSQL Association. Elizabeth enjoys writing about Postgres for Newbies and teaching people about databases whenever she gets the chance.

Comments are closed.

These comments are closed.
Creative Commons LicenseThis work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License.

About This Site

The opinions expressed on this website are those of each author, not of the author's employer or of Red Hat.

Opensource.com aspires to publish all content under a Creative Commons license but may not be able to do so in all cases. You are responsible for ensuring that you have the necessary permission to reuse any work on this site. Red Hat and the Red Hat logo are trademarks of Red Hat, Inc., registered in the United States and other countries.

A note on advertising: Opensource.com does not sell advertising on the site or in any of its newsletters.

AltStyle によって変換されたページ (->オリジナル) /