Commit 76f003f

committed

Day 46

1 parent 4785bee commit 76f003fCopy full SHA for 76f003f

File tree

2 files changed

+100

-0

lines changed

0046_how_to_deal_with_bloat.md
README.md

2 files changed

+100

-0

lines changed

`‎0046_how_to_deal_with_bloat.md`

Lines changed: 99 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,99 @@`
	`1`	`+Originally from: [tweet](https://twitter.com/samokhvalov/status/1723333152847077428), [LinkedIn post]().`
	`2`	`+`
	`3`	`+---`
	`4`	`+`
	`5`	`+# How to deal with bloat`
	`6`	`+`
	`7`	`+> I post a new PostgreSQL "howto" article every day. Join me in this`
	`8`	`+> journey – [subscribe](https://twitter.com/samokhvalov/), provide feedback, share!`
	`9`	`+`
	`10`	`+## What is bloat?`
	`11`	`+`
	`12`	+Bloat is the free space inside pages, created when `autovacuum` deletes a large number of tuples.
	`13`	`+`
	`14`	`+When a row in a table is updated, Postgres doesn't overwrite the old data. Instead, it marks the old row version (tuple)`
	`15`	`+as "dead" and creates a new row version. Over time, as more rows are updated or deleted, the space taken up by these`
	`16`	+dead tuples can accumulate. At some point, `autovacuum` (or manual `VACUUM`) deletes dead tuples, leaving free space
	`17`	`+inside pages, available for reuse. But if large numbers of dead tuples accumulate, large volumes of free space can be`
	`18`	`+left behind – in the worst cases, it can occupy 99% of all table or index space, or even more.`
	`19`	`+`
	`20`	`+Low values of bloat (say, below 40%) should not be considered a problem, while high values definitely should be, as they`
	`21`	`+lead to bad consequences:`
	`22`	`+`
	`23`	`+1. Higher disk usage`
	`24`	`+2. More IO needed for read and write queries`
	`25`	`+3. Lower cache efficiency (both buffer pool and OS file cache)`
	`26`	`+4. As a result, worse query performance`
	`27`	`+`
	`28`	`+## How to check bloat`
	`29`	`+`
	`30`	`+Index and table bloat should be regularly checked. Note that most queries that are commonly used are estimation-based`
	`31`	`+and are prone to false positives -- depending on table structure, it can show some non-existent bloat (I saw cases with`
	`32`	`+up to 40% of phantom bloat in freshly created tables). But such queries are fast and don't require additional extensions`
	`33`	`+installed. Examples:`
	`34`	`+`
	`35`	`+- [Estimated table bloat](https://github.com/NikolayS/postgres_dba/blob/master/sql/b1_table_estimation.sql)`
	`36`	`+- [Estimated btree index bloat](https://github.com/NikolayS/postgres_dba/blob/master/sql/b2_btree_estimation.sql)`
	`37`	`+`
	`38`	`+Recommendations for use in monitoring systems:`
	`39`	`+`
	`40`	`+- There is little sense in running them often (e.g., every minute), as bloat levels don't change rapidly.`
	`41`	`+- There should be a warning provided to users that the results are estimates.`
	`42`	`+- For large databases, query execution may take a long time, up to many seconds, so the frequency of checks and`
	`43`	+ `statement_timeout` might need to be adjusted.
	`44`	`+`
	`45`	`+Approaches to determine the bloat levels more precisely:`
	`46`	`+`
	`47`	+- queries based on `pgstattuple` (the extension has to be installed)
	`48`	+- checking DB object sizes on a clone, running `VACUUM FULL` (heavy and blocks queries, thus not for production), and
	`49`	`+ then checking sizes again and comparing before/after`
	`50`	`+`
	`51`	`+Periodical checks are definitely recommended to control bloat levels and react, when needed.`
	`52`	`+`
	`53`	`+## Index bloat mitigation (reactive)`
	`54`	`+`
	`55`	+Unfortunately, in databases that experience many `UPDATE`s and `DELETE`s, index health inevitably degrades over time.
	`56`	`+This means, that indexes need to be rebuilt regularly.`
	`57`	`+`
	`58`	`+Recommendations:`
	`59`	`+`
	`60`	+* Use `REINDEX CONCURRENTLY` to rebuild bloated index in a non-blocking fashion.
	`61`	+* Remember that `REINDEX CONCURRENTLY` holds the `xmin` horizon when running. This affects `autovacuum`'s ability to
	`62`	`+ clean up freshly-dead tuples in all tables and indexes. This is another reason to use partitioning; do not allow`
	`63`	`+ tables to exceed certain threshold (say, more than 100 GiB).`
	`64`	`+* You can monitor the progress of reindexing using the approach`
	`65`	`+ from [Day 15: How to monitor CREATE INDEX / REINDEX](0015_how_to_monitor_index_operations.md).`
	`66`	`+* Prefer using Postgres versions 14+, since in PG14, btree indexes were significantly optimized to degrade much slower`
	`67`	`+ under written workloads.`
	`68`	`+`
	`69`	`+## Table bloat mitigation (reactive)`
	`70`	`+`
	`71`	`+Some levels of table bloat may be considered as not a bad thing because they increase chances of having optimized`
	`72`	+`UPDATE`s -- HOT (Heap-Only Tuples) `UPDATE`s.
	`73`	`+`
	`74`	`+However, if the level is concerning, consider using [pg_repack](https://github.com/reorg/pg_repack) to rebuild the table`
	`75`	+without long-lasting exclusive locks. Alternative to `pg_repack`:
	`76`	`+[pg_squeeze](https://github.com/cybertec-postgresql/pg_squeeze).`
	`77`	`+`
	`78`	`+Normally, this process doesn't need to be scheduled and fully automated; usually, it is enough to apply it under control`
	`79`	`+only when high table bloat is detected.`
	`80`	`+`
	`81`	`+## Proactive bloat mitigation`
	`82`	`+`
	`83`	+* Tune `autovacuum`.
	`84`	+* Monitor the `xmin` horizon and don't allow it to be too far in the past --
	`85`	`+ [Day 45: How to monitor xmin horizon to prevent XID/MultiXID wraparound and high bloat](0045_how_to_monitor_xmin_horizon.md).`
	`86`	`+* Do not allow unnecessary long-running transactions (e.g., > 1h), neither on the primary, nor on standbys with`
	`87`	+ `hot_standby_feedback` turned on.
	`88`	`+* If on Postgres 13 or older, consider upgrading to 14+ to benefit from btree index optimizations.`
	`89`	`+* Partition large (100+ GiB) tables.`
	`90`	+* Use partitioning for tables with queue-like workloads, even if they are small, and use `TRUNCATE` or drop partitions
	`91`	+ with old data instead of using `DELETE`s; in this case, vacuuming is not needed, bloat is not an issue.
	`92`	+* Do not use massive `UPDATE`s and `DELETE`s, always work in batches (lasting not more than 1-2s);
	`93`	+ ensure that `autovacuum` cleans up dead tuples promptly or `VACUUM` manually when massive data changes need to happen.
	`94`	`+`
	`95`	`+## Materials worth reading`
	`96`	`+`
	`97`	`+* Postgres docs: [Routine Vacuuming](https://postgresql.org/docs/current/routine-vacuuming.html)`
	`98`	+* [When `autovacuum` does not vacuum](https://2ndquadrant.com/en/blog/when-`autovacuum`-does-not-vacuum/)
	`99`	`+* [How to Reduce Bloat in Large PostgreSQL Tables](https://timescale.com/learn/how-to-reduce-bloat-in-large-postgresql-tables/)`

`‎README.md`

Lines changed: 1 addition & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -72,6 +72,7 @@ As an example, first 2 rows:`
`72`	`72`	`- 0043 [How to format SQL](./0043_how_to_format_sql.md)`
`73`	`73`	`- 0044 [How to monitor transaction ID wraparound risks](./0044_how_to_monitor_transaction_id_wraparound_risks.md)`
`74`	`74`	`- 0045 [How to monitor xmin horizon to prevent XID/MultiXID wraparound and high bloat](./0045_how_to_monitor_xmin_horizon.md)`
	`75`	`+- 0046 [How to deal with bloat](./0046_how_to_deal_with_bloat.md)`
`75`	`76`	`- ...`
`76`	`77`
`77`	`78`	`## Contributors`

0 commit comments

Comments

(0)

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit 76f003f

File tree

2 files changed

2 files changed

`‎0046_how_to_deal_with_bloat.md`

`‎README.md`

0 commit comments