I wondered how to optimize the RAM usage of my PostgreSQL database deployed in Kubernetes.
Particularly I'm interested in configuring the shared_buffers
and effective_cache_size
values.
Typically the recommendation stated by various sources is to use:
If you have a system with 1GB or more of RAM, a reasonable starting value for shared_buffers is 1/4 of the memory in your system. If you have less RAM you'll have to account more carefully for how much RAM the OS is taking up; closer to 15% is more typical there. There are some workloads where even larger settings for shared_buffers are effective, but given the way PostgreSQL also relies on the operating system cache, it's unlikely you'll find using more than 40% of RAM to work better than a smaller amount. (PostgreSQL Website)
The value should be set to 15% to 25% of the machine’s total RAM (EDB website)
No more than 50% of the available RAM (EDB PostgreSQL tuning)
Since containers use by design much less background RAM what would be a good recommendation for setting the shared_buffers value?
Right now the value is set to 25% (1GB of 4GB total container limit) and I notive that it get's filled up quite quickly.
-
Impossible to answer because we have no idea how you're going to use the database. There is no one-size-fits-all solution. And you already found the 25% recommendation to get started. For the rest, you have to test and optimise your own environment.Frank Heikens– Frank Heikens2021年11月25日 18:44:00 +00:00Commented Nov 25, 2021 at 18:44
-
Thanks for the answer. Could you recommend me any good learning resources on Postgres optimization in general?Jacob– Jacob2021年11月25日 22:55:37 +00:00Commented Nov 25, 2021 at 22:55
1 Answer 1
shared_buffers vs. effective cache size
Within postgresql tuning these two numbers represent:
- shared_buffers "how much memory is dedicated to PostgreSQL to use for caching data"
- effective_cache_size "how much memory is available for disk caching by the operating system and within the database itself"
So repeated queries that are cached will work "better" if there is a lot of shared_buffers. But reading tables from the disk that underlies the database will work "better" if there is more effective_cache_size
If the workload is so varied and non repetitive that repeated disk reads are required then a higher effective_cache_size is better. If there are lots of queries that are the same from a small set then more shared_buffers is better
It's worth remembering that on a normally working Linux system the system will grab any "idle" memory for disk caching, on the basis that unexpected reads might happen and if they do this otherwise unused memory could speed them up
How to work out what settings are best
It's worth mentioning that the most often used "method" is to just not care and over provision the memory used. Hardware is cheap and time to sort it out properly is expensive. That said..
There are two approaches
- run the application in prod and monitor the memory pressure
- devise a test to find the limits of memory use
To monitor memory in prod, usually this is done with a third party package such as cAdvisor or Prometheus
Look at the graphs and compare them with your incoming requests. If there is clearly a lot of head room, then increase the parameters and see if this helps the system efficiency.
For testing, essentially you do the same thing but using an artificial test workload
To work out what memory sizes are best there are automated tools like senpai. Basically they will run a test workload repeatedly under various memory settings until it stalls