Postgres select on a very large table

Question 1

We've a very large table with more than 2.2 billion rows at present on Postgres 12.5. The total size of the table (including index) stands at 500 GB. There is one query that we need to do in order to find a set of valid rows from the data set and do updates on them. The query looks something like this:

select id, col4 from table where col1=1ドル and col2=2ドル and col3='f' and col4>0 order by col5 limit 10

To serve this query, there is an index on the table ON (col1, col2, col5) and the query uses this index. So far so good. The problem arises when the database needs to do a lot of disk seeks when there's a miss from buffer. This leads to the queries waiting on DataFileRead.

Up until now we were using a 16 vCPU and 128 GB machine with io1 storage type with a provisioned IOPS of 20000 for hosting (it's hosted on AWS RDS). We started with a provisioned IOPS of about 3000 and kept increasing it with hopes that with an aggressive autovacuum and data localization it will stabilize at some value. The autovacuum is configured in such a way that it runs every couple of days on this table. We recently faced an issue where the read IOPS started hitting 20000 and the application got too slow. We upgraded to a larger machine with exactly double the size since we could no longer provision IOPS more than 20000 on the prior machine.

On the larger machine we're observing that even the read IOPS has now fallen to ~5000 and the machine now consumes overall IOPS of around 6000 at peak times and the query time has halved precisely. This certainly has to do with the higher shared_buffer now available for postgres to keep the hot referenced rows in cache, we're assuming.

The problem is that the machine which we're now using is running at ~5% CPU load and there's 184 GB of RAM still unused. All in all, this machine is heavily underutilized. We want to be using the smaller machine by doing any changes in parameters so that this query can run under some tolerable latency limit. We've tried multiple memory tuning in the previous machine so as to fully utilize the RAM. But increasing shared_buffer to more than 40% of RAM always led to queries getting extremely slow and we always had to revert it back to the previous value.

Sharing a few Postgres db parameters (currently on the bigger machine):

effective_cache_size: 130GB
shared_buffers: 66GB
work_mem: 4MB
maintenance_work_mem: 8.5GB

P.S.: The data growth is about 30 million entries per day so this is going to go worse. The database went live for the production use exactly 3 months back. We also want suggestions on building a sustainable solution. Due to the nature of the application, we can't partition the table unless it's 6 months old or more. Sharding would be our last resort but we want to exhaust all our options before moving to this solution.

Edit: Attaching the query plans (1st when there's no data in buffer and the second is the immediate subsequent query hit). The performance looks more than acceptable since we're on a larger machine but this was taking more than 1 second on the smaller one.

enter image description here

No, we don't need to update a billion rows at a time. We only need to update a few. It's very difficult to tell the number of rows affected during update but it will not be more than 20 for a particular transaction. I can give an idea what we're trying to achieve here. It's like a bag with a capacity C which we're trying to fill with col4 values only taking entries which are valid (col3 is 'f' and col4 > 0). We're looking at 10 entries at a time from the database and if need be there might be a subsequent same query on the database to fetch the next valid entries. In this process, only col4 is updated which can either be set to zero since it's been consumed or a number lower than it's current value.

Looking for any thoughts or suggestions. Thanks in advance.

Question 2

Please show use the EXPLAIN (ANALYZE, BUFFERS) for this query when it is slow.

Question 3

If you need to update billions of rows, why are you doing it ten at a time?

Question 4

Presumably after you update the selected rows, they no longer meet the condition to be immediately updated again. Why don't they? Which column has changed in the update?

Question 5

@jjanes I've added more details in the question.

Question 6

thinking outside the box, is a single postgres server sufficient to cope with these kind of sizes? A solution might be to use bigdata systems like google bigquery which could handle these kind of queries in mere seconds.

Question 7

You are currently removing 855 rows using a filter on col3 and col4, in order to find 10 rows which pass that filter. So as I feared, the things that fail that filter might be rarer than other things, but they are sitting right in the way. And the next time you need 10 more things, they will still be in the way. And the next time. Not only are you doing 85 times more work than you need to for every execution, you are hitting ~85 times more pages. If that same thing happens for every other combination of col1 and col2, then no wonder you keep running out cache space and IOPS. And of course there is no reason for it to stop there, you could have far more than 850 accumulate in the way, if you have nothing to get rid of them.

You could use a partial index to avoid visiting those rows each time:

create index on t (col1, col2, col5) where col3='f' and col4>0;

Alternatively, each time col3 turns true or col4 turns 0, you could just delete the row, and (possibly) insert it into some history table if you need to keep some record of it.

Question 8

Your WHERE clause has these elements:

col1 = constant (equality)
col2 = constant (equality)
col3 = constant (equality)
col4 > constant (range)

To satisfy this query from your existing columns as efficiently as possible, use a composite (multicolumn) BTREE index on all those columns. You can put cols 1-3 in any order you wish in the index. But col4, the one filtered by range, must come after the columns filtered by constants.

Your overall query looks like this:

SELECT id, col4 FROM tbl WHERE ... ORDER BY col5 LIMIT 10

This query can be satisfied with a covering index such as this.

CREATE INDEX covering 
 ON tbl (col1, col2, col3, col4) INCLUDE (col5, id)

As you can see, every column needed to satisfy your query can come from an index scan. This will save lots of IOPS on your server, at the cost of a larger index and slower INSERT / UPDATE operations. You can think of B-tree indexes as being sorted in order. postgreSQL random-accesses the index to the first eligible row, then reads it sequentially until the first ineligible row. That's as fast as it gets.

Your query contains the notorious performance antipattern ORDER BY col LIMIT constant. If the typical number of rows needing ordering in each query is small, this doesn't matter much.

If the number of rows needing ordering is large, you may require more index trickery. In that case get your covering index to work and ask another question.

Be careful: covering indexes must be carefully crafted to match their queries. If you change the query the index may need changing too.

Read this helpful book by Marcus Winand. https://use-the-index-luke.com/

Question 9

col3 is a boolean column and entries are heavily skewed towards 'false'. More than 70% of the entries have values >0 in col4. We've already tried with an index with all the columns in the index but postgres doesn't seem to use that index so in order to not to compromise with UPDATES/INSERTS, we settled with the index mentioned in the question. Our test database contains as much as 35 million entries (on a much smaller machine) and covering index didn't help either.

Question 10

And, maybe switch to an EC2 i3 instance type like i3.4xlarge with built-in NVM-e storage.

Question 11

@RahulSharma less than 30% of 2.2 billion might still be a very large number, and can certainly ruin your day. Especially if they are concentrated in the wrong spot (like towards the low values of col5).

Question 12

There is nothing wrong with ORDER BY col LIMIT 10.

Question 13

@LaurenzAlbe I don't think there is anything inherently wrong with it, but it is surely one of the constructs most likely to run into catastrophic plan choices or execution problems.

Question 14

I believe you need to increase your shared_buffer. The issue with 40% allocation is described here: https://www.postgresql.org/docs/current/runtime-config-resource.html

"If you have a dedicated database server with 1GB or more of RAM, a reasonable starting value for shared_buffers is 25% of the memory in your system. There are some workloads where even large settings for shared_buffers are effective, but because PostgreSQL also relies on the operating system cache, it is unlikely that an allocation of more than 40% of RAM to shared_buffers will work better than a smaller amount. Larger settings for shared_buffers usually require a corresponding increase in checkpoint_segments, in order to spread out the process of writing large quantities of new or changed data over a longer period of time."

Here is a pretty good ref for setting check_points (in Postgres 9.5 this is called Write Ahead Logs): https://www.2ndquadrant.com/en/blog/basics-of-tuning-checkpoints/

Settings with default values: checkpoint_timeout = 5min max_wal_size = 1GB (before PostgreSQL 9.5 this was checkpoint_segments)

The suggest is to adjust these values. Ensure the Max Wal Size is set high enough to basically never be set.

In addition, I suggest putting your logs on Disks that have a fast enough IO write time to keep up.

Question 15

That will only lead to inefficient use of RAM, not to a slow execution...

Question 16

Which part? The IO on the disks? I am not really a DBAdmin, more of a storage admin. IMO, it depends on the workload. GL

jjanes jjanes 42.4k3 gold badges44 silver badges54 bronze badges · Accepted Answer · 2021-09-03 21:45:09Z

You are currently removing 855 rows using a filter on col3 and col4, in order to find 10 rows which pass that filter. So as I feared, the things that fail that filter might be rarer than other things, but they are sitting right in the way. And the next time you need 10 more things, they will still be in the way. And the next time. Not only are you doing 85 times more work than you need to for every execution, you are hitting ~85 times more pages. If that same thing happens for every other combination of col1 and col2, then no wonder you keep running out cache space and IOPS. And of course there is no reason for it to stop there, you could have far more than 850 accumulate in the way, if you have nothing to get rid of them.

You could use a partial index to avoid visiting those rows each time:

create index on t (col1, col2, col5) where col3='f' and col4>0;

Alternatively, each time col3 turns true or col4 turns 0, you could just delete the row, and (possibly) insert it into some history table if you need to keep some record of it.

Stack Exchange Network

Postgres select on a very large table

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Postgres select on a very large table

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions