We've a very large table with more than 2.2 billion rows at present on Postgres 12.5. The total size of the table (including index) stands at 500 GB. There is one query that we need to do in order to find a set of valid rows from the data set and do updates on them. The query looks something like this:
select id, col4 from table where col1=1ドル and col2=2ドル and col3='f' and col4>0 order by col5 limit 10
To serve this query, there is an index on the table ON (col1, col2, col5)
and the query uses this index. So far so good. The problem arises when the database needs to do a lot of disk seeks when there's a miss from buffer. This leads to the queries waiting on DataFileRead
.
Up until now we were using a 16 vCPU and 128 GB machine with io1 storage type with a provisioned IOPS of 20000 for hosting (it's hosted on AWS RDS). We started with a provisioned IOPS of about 3000 and kept increasing it with hopes that with an aggressive autovacuum and data localization it will stabilize at some value. The autovacuum is configured in such a way that it runs every couple of days on this table. We recently faced an issue where the read IOPS started hitting 20000 and the application got too slow. We upgraded to a larger machine with exactly double the size since we could no longer provision IOPS more than 20000 on the prior machine.
On the larger machine we're observing that even the read IOPS has now fallen to ~5000 and the machine now consumes overall IOPS of around 6000 at peak times and the query time has halved precisely. This certainly has to do with the higher shared_buffer
now available for postgres to keep the hot referenced rows in cache, we're assuming.
The problem is that the machine which we're now using is running at ~5% CPU load and there's 184 GB of RAM still unused. All in all, this machine is heavily underutilized. We want to be using the smaller machine by doing any changes in parameters so that this query can run under some tolerable latency limit. We've tried multiple memory tuning in the previous machine so as to fully utilize the RAM. But increasing shared_buffer
to more than 40% of RAM always led to queries getting extremely slow and we always had to revert it back to the previous value.
Sharing a few Postgres db parameters (currently on the bigger machine):
effective_cache_size: 130GB
shared_buffers: 66GB
work_mem: 4MB
maintenance_work_mem: 8.5GB
P.S.: The data growth is about 30 million entries per day so this is going to go worse. The database went live for the production use exactly 3 months back. We also want suggestions on building a sustainable solution. Due to the nature of the application, we can't partition the table unless it's 6 months old or more. Sharding would be our last resort but we want to exhaust all our options before moving to this solution.
Edit: Attaching the query plans (1st when there's no data in buffer and the second is the immediate subsequent query hit). The performance looks more than acceptable since we're on a larger machine but this was taking more than 1 second on the smaller one.
No, we don't need to update a billion rows at a time. We only need to update a few. It's very difficult to tell the number of rows affected during update but it will not be more than 20 for a particular transaction. I can give an idea what we're trying to achieve here. It's like a bag with a capacity C
which we're trying to fill with col4
values only taking entries which are valid (col3
is 'f'
and col4
> 0). We're looking at 10 entries at a time from the database and if need be there might be a subsequent same query on the database to fetch the next valid entries. In this process, only col4
is updated which can either be set to zero since it's been consumed or a number lower than it's current value.
Looking for any thoughts or suggestions. Thanks in advance.
3 Answers 3
You are currently removing 855 rows using a filter on col3 and col4, in order to find 10 rows which pass that filter. So as I feared, the things that fail that filter might be rarer than other things, but they are sitting right in the way. And the next time you need 10 more things, they will still be in the way. And the next time. Not only are you doing 85 times more work than you need to for every execution, you are hitting ~85 times more pages. If that same thing happens for every other combination of col1 and col2, then no wonder you keep running out cache space and IOPS. And of course there is no reason for it to stop there, you could have far more than 850 accumulate in the way, if you have nothing to get rid of them.
You could use a partial index to avoid visiting those rows each time:
create index on t (col1, col2, col5) where col3='f' and col4>0;
Alternatively, each time col3 turns true or col4 turns 0, you could just delete the row, and (possibly) insert it into some history table if you need to keep some record of it.
Your WHERE clause has these elements:
- col1 = constant (equality)
- col2 = constant (equality)
- col3 = constant (equality)
- col4 > constant (range)
To satisfy this query from your existing columns as efficiently as possible, use a composite (multicolumn) BTREE index on all those columns. You can put cols 1-3 in any order you wish in the index. But col4, the one filtered by range, must come after the columns filtered by constants.
Your overall query looks like this:
SELECT id, col4 FROM tbl WHERE ... ORDER BY col5 LIMIT 10
This query can be satisfied with a covering index such as this.
CREATE INDEX covering
ON tbl (col1, col2, col3, col4) INCLUDE (col5, id)
As you can see, every column needed to satisfy your query can come from an index scan. This will save lots of IOPS on your server, at the cost of a larger index and slower INSERT / UPDATE operations. You can think of B-tree indexes as being sorted in order. postgreSQL random-accesses the index to the first eligible row, then reads it sequentially until the first ineligible row. That's as fast as it gets.
Your query contains the notorious performance antipattern ORDER BY col LIMIT constant
. If the typical number of rows needing ordering in each query is small, this doesn't matter much.
If the number of rows needing ordering is large, you may require more index trickery. In that case get your covering index to work and ask another question.
Be careful: covering indexes must be carefully crafted to match their queries. If you change the query the index may need changing too.
Read this helpful book by Marcus Winand. https://use-the-index-luke.com/
-
col3
is a boolean column and entries are heavily skewed towards 'false'. More than 70% of the entries have values >0 incol4
. We've already tried with an index with all the columns in the index but postgres doesn't seem to use that index so in order to not to compromise with UPDATES/INSERTS, we settled with the index mentioned in the question. Our test database contains as much as 35 million entries (on a much smaller machine) and covering index didn't help either.Rahul Sharma– Rahul Sharma2021年09月02日 12:16:32 +00:00Commented Sep 2, 2021 at 12:16 -
And, maybe switch to an EC2 i3 instance type like i3.4xlarge with built-in NVM-e storage.O. Jones– O. Jones2021年09月02日 12:17:17 +00:00Commented Sep 2, 2021 at 12:17
-
@RahulSharma less than 30% of 2.2 billion might still be a very large number, and can certainly ruin your day. Especially if they are concentrated in the wrong spot (like towards the low values of col5).jjanes– jjanes2021年09月02日 18:36:12 +00:00Commented Sep 2, 2021 at 18:36
-
There is nothing wrong with
ORDER BY col LIMIT 10
.Laurenz Albe– Laurenz Albe2021年09月02日 19:13:48 +00:00Commented Sep 2, 2021 at 19:13 -
2@LaurenzAlbe I don't think there is anything inherently wrong with it, but it is surely one of the constructs most likely to run into catastrophic plan choices or execution problems.jjanes– jjanes2021年09月02日 20:58:11 +00:00Commented Sep 2, 2021 at 20:58
I believe you need to increase your shared_buffer. The issue with 40% allocation is described here: https://www.postgresql.org/docs/current/runtime-config-resource.html
"If you have a dedicated database server with 1GB or more of RAM, a reasonable starting value for shared_buffers is 25% of the memory in your system. There are some workloads where even large settings for shared_buffers are effective, but because PostgreSQL also relies on the operating system cache, it is unlikely that an allocation of more than 40% of RAM to shared_buffers will work better than a smaller amount. Larger settings for shared_buffers usually require a corresponding increase in checkpoint_segments, in order to spread out the process of writing large quantities of new or changed data over a longer period of time."
Here is a pretty good ref for setting check_points (in Postgres 9.5 this is called Write Ahead Logs): https://www.2ndquadrant.com/en/blog/basics-of-tuning-checkpoints/
Settings with default values: checkpoint_timeout = 5min max_wal_size = 1GB (before PostgreSQL 9.5 this was checkpoint_segments)
The suggest is to adjust these values. Ensure the Max Wal Size is set high enough to basically never be set.
In addition, I suggest putting your logs on Disks that have a fast enough IO write time to keep up.
-
That will only lead to inefficient use of RAM, not to a slow execution...Laurenz Albe– Laurenz Albe2021年09月02日 19:12:58 +00:00Commented Sep 2, 2021 at 19:12
-
Which part? The IO on the disks? I am not really a DBAdmin, more of a storage admin. IMO, it depends on the workload. GLTechnoob1984– Technoob19842021年09月02日 21:57:50 +00:00Commented Sep 2, 2021 at 21:57
Explore related questions
See similar questions with these tags.
EXPLAIN (ANALYZE, BUFFERS)
for this query when it is slow.