Commit f00f66e

committed

Day 13

1 parent 3fecb7d commit f00f66eCopy full SHA for f00f66e

File tree

2 files changed

+147

-0

lines changed

0013_how_to_benchmark.md
files
- 0013_db_benchmark.png

2 files changed

+147

-0

lines changed

`‎0013_how_to_benchmark.md`

Lines changed: 147 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,147 @@`
	`1`	`+Originally from: [tweet](https://twitter.com/samokhvalov/status/1711237471743480060), [LinkedIn post](...).`
	`2`	`+`
	`3`	`+---`
	`4`	`+`
	`5`	`+# How to benchmark`
	`6`	`+`
	`7`	`+> I post a new PostgreSQL "howto" article every day. Join me in this`
	`8`	`+> journey – [subscribe](https://twitter.com/samokhvalov/), provide feedback, share!`
	`9`	`+`
	`10`	`+Benchmarking is a huge topic. Here, we cover minimal steps that are required for a Postgres benchmark to be informative,`
	`11`	`+useful, and correct.`
	`12`	`+`
	`13`	`+In this article, we assume that the following principles are followed:`
	`14`	`+`
	`15`	`+1. NOT A SHARED ENV: The whole machine is under our solely use (nobody else is using it), we aim to study the`
	`16`	`+ behavior`
	`17`	`+ of Postgres as a whole, with all its components (vs. microbenchmarks such as studying a particular query via using`
	`18`	+ `EXPLAIN` or focusing on underlying components such as disk and filesystem performance).
	`19`	`+`
	`20`	`+2. HIGH QUALITY: We aim to be honest (no "benchmarketing" goals), transparent (all details are shared), precise, and`
	`21`	`+ fix all mistakes when/if they happen. When it makes sense, each benchmark run should be long enough to take into`
	`22`	`+ account factors like colder state of caches or various fluctuations. There should be also multiple runs of each type`
	`23`	`+ of benchmark (usually, at least 4-5), to ensure that runs are reproducible. It makes sense to learn from existing`
	`24`	`+ experience: excellent [Brendan Gregg's book "System`
	`25`	`+ Performance"](https://brendangregg.com/blog/2020-07-15/systems-performance-2nd-edition.html) has a chapter about`
	`26`	`+ benchmarks; it may also be useful to learn from other fields (physics, etc.), to understand the principles and`
	`27`	`+ methodologies of [successful experiments](https://en.wikipedia.org/wiki/Experiment).`
	`28`	`+`
	`29`	`+3. READY TO LEARN: We have enough expertise to understand the bottlenecks and limitations or ready to use other`
	`30`	`+ people's help, re-making benchmarks if needed.`
	`31`	`+`
	`32`	`+## Benchmark structure`
	`33`	`+`
	`34`	`+Benchmark is a kind of database experiment, where, in general case, we use multiple sessions to DBMS and study the`
	`35`	`+behavior of the system as a whole, and it's all or particular components (e.g., buffer pool, checkpointer, replication).`
	`36`	`+`
	`37`	`+Each benchmark run should have a well-defined structure. In general, it contains two big parts:`
	`38`	`+`
	`39`	`+1. INPUT: everything we have or define before conducting the database – where we run the benchmark, how the system`
	`40`	`+ was`
	`41`	`+ configured, what DB and workload we use, what change we aim to study (to compare the behavior before and after the`
	`42`	`+ change).`
	`43`	`+2. OUTPUT: various observability data such as logs, errors observed, statistics, etc.`
	`44`	`+`
	`45`	`+![Database Benchmark](files/0013_db_benchmark.png)`
	`46`	`+`
	`47`	`+Each part should be well-documented so anyone can reproduce the experiment, understand the main metrics (latency,`
	`48`	`+throughput, etc.), understand bottlenecks, conduct additional, modified experiments if needed.`
	`49`	`+`
	`50`	`+The description of all aspects of database benchmarking could take a whole book – here I provide only basic`
	`51`	`+recommendations that can help you avoid mistakes and improve the general quality of your benchmarks.`
	`52`	`+`
	`53`	`+Of course, some of the things can be omitted, if needed. But in general case, it is recommended to automate`
	`54`	`+documentation and artifact collection for all experiments, so it would be easy to study the details later. You can`
	`55`	`+find [here](https://gitlab.com/postgres-ai/postgresql-consulting/tests-and-benchmarks/-/issues)`
	`56`	`+some good examples of benchmarks performed for specific purposes (e.g., to study pathological subtransaction behavior or`
	`57`	+to measure the benefits of enabling `wal_compression`).
	`58`	`+`
	`59`	`+## INPUT: Environment`
	`60`	`+`
	`61`	`+- Make sure you're using machines of a proper size, don't run on laptop (unless absolutely necessary). AWS Spot or GCP`
	`62`	`+ Preemptible instances, used during short periods of time, are extremely affordable and super helpful for`
	`63`	`+ experimenting. For example, spot instances for VMs with 128 Intel or AMD vCPUs and 256-1024 GiB of RAM have hourly`
	`64`	`+ price as low as 5ドル-10 ([good comparison tool](https://instances.vantage.sh)), billed by second – this enables very`
	`65`	`+ cost-efficient experiments on large machines.`
	`66`	`+- When you get VMs, quickly check them with microbenchmark tools such as fio, sysbench to ensure that CPU, RAM, disk,`
	`67`	`+ network work as advertised. If in doubts, compare to other same-type VMs and choose.`
	`68`	`+- Document VM tech spec, disk, and filesystem (these two are extremely important for databases!), OS choices, and`
	`69`	`+ non-default settings.`
	`70`	`+- Document Postgres version used and additional extensions, if any (some of them can have "observer effect" even if they`
	`71`	`+ are just installed).`
	`72`	`+- Document non-default Postgres settings used.`
	`73`	`+`
	`74`	`+## INPUT: Database`
	`75`	`+`
	`76`	`+- Document what schema and data you use. Ideally, in a fully reproducible form (SQL / dump).`
	`77`	`+`
	`78`	`+## INPUT: Workload`
	`79`	`+`
	`80`	`+- Document all the aspects of the workload you've used – ideally, in a fully reproducible form (SQL, pgbench, sysbench,`
	`81`	`+ etc. details).`
	`82`	`+- Understand the type of your benchmark, the kind of load testing you're aiming to have: is it edge-case load testing`
	`83`	`+ (stress testing) when you aim to go "full speed" or regular load testing, in which you try to simulate real-life`
	`84`	`+ situation when, for example, CPU usage is normally far below 100%. Note that by default, pgbench tends to give you`
	`85`	+ "stress testing" (not limiting the number of TPS – to limit it, use option `-R`).
	`86`	`+`
	`87`	`+## INPUT: Delta`
	`88`	`+`
	`89`	`+There may be various types of "deltas" (the subject of our study that define the difference between runs). Here are just`
	`90`	`+some examples:`
	`91`	`+`
	`92`	`+- different Postgres minor or major versions`
	`93`	`+- different OS versions`
	`94`	+- different settings such as `work_mem`
	`95`	`+- different hardware`
	`96`	`+- varying scale: different number of clients working with database or different table sizes`
	`97`	`+- different filesystems`
	`98`	`+`
	`99`	`+It is not recommended to consider schema changes of changes in SQL queries as "delta" because:`
	`100`	`+`
	`101`	`+- such workload changes usually happen at a very high pace`
	`102`	`+- full-fledged benchmarking is very expensive`
	`103`	`+- it is possible to study schema and query changes in shared environments, focusing on IO metrics (BUFFERS!), achieving`
	`104`	`+ high level of time and cost efficiency (see [@Database_Lab](https://twitter.com/Database_Lab))`
	`105`	`+`
	`106`	`+## OUTPUT: collect artifacts`
	`107`	`+`
	`108`	`+It is worth collecting a lot of various artifacts and making sure they will not be lost (e.g., upload them to an object`
	`109`	`+storage).`
	`110`	`+`
	`111`	+- Before each run, reset all statistics, using `pg_stat_reset()`, `pg_stat_reset_shared(..)`, other standard
	`112`	+ `pg_stat_reset_***()` functions ([docs](https://postgresql.org/docs/current/monitoring-stats.html)),
	`113`	+ `pg_stat_statements_reset()`, `pg_stat_kcache_reset()`, and so on.
	`114`	+- After each run, dump all `pg_stat_***` views in CSV format.
	`115`	`+- Collect all Postgres logs and any other related logs (e.g., pgBouncer's, Patroni's, syslog).`
	`116`	`+- While Postgres, pgBouncer or any other configs are "input", it makes sense to create a snapshot of all actual observed`
	`117`	+ configuration values (e.g., `select * from pg_settings;`) and consider this data as artifacts as well.
	`118`	+- Collect the query analysis information: snapshot of `pg_stat_statements`, `pg_stat_kcache`, `pg_wait_sampling`
	`119`	`+ / [pgsentinel](https://github.com/pgsentinel/pgsentinel), etc.`
	`120`	+- Extract all information about errors from (a) logs, (b) `pg_stat_database` (`xact_rollback`) and similar places via
	`121`	`+ SQL,`
	`122`	`+ and consider this a separate, important type of artifact for analysis. Consider using a small extension called`
	`123`	`+ [logerrors](https://github.com/munakoiso/logerrors) that will register all error codes and expose them via SQL.`
	`124`	`+- If monitoring is used, collect charts from there. For experiments particularly, it may be convenient to use`
	`125`	`+ [Netdata](https://netdata.cloud) since it's really easy to install on a fresh machine, and it has dashboard`
	`126`	`+ export/import functions (unfortunately, they are client-side, hence manual actions are always needed; but, I`
	`127`	`+ personally find them very convenient when conducting DB experiments).`
	`128`	`+`
	`129`	`+## Analysis`
	`130`	`+`
	`131`	`+Some tips (far from being complete):`
	`132`	`+`
	`133`	`+1. Always check errors. It's not uncommon to have a benchmark run, jump into some conclusion, and only later realize`
	`134`	`+ that the error count was too high, making the run not useful.`
	`135`	`+2. Understand where the bottleneck is. Very often, we are saturated, say, on disk IO, and think we observe behavior of`
	`136`	`+ our database system, but we actually observe the behavior of, say, cloud disk throttling or filesystem limitations`
	`137`	`+ instead. In such cases we need to think how to tune our input to avoid such bottlenecks, to perform useful`
	`138`	`+ experiments.`
	`139`	`+3. In some cases, it is, vice versa, very desired to reach some kind saturation – for example, if we study the speed of`
	`140`	+ `pg_dump` or `pg_restore`, we may want to observe our disk system saturated, and we tune the input (e.g. how exactly
	`141`	+ we `pg_dump` – how many parallel workers we use, is compression involved, is network involved, etc.) so the desired
	`142`	`+ saturation is indeed reached, and we can demonstrate it.`
	`143`	`+4. Understand the main metrics you're going to compare between runs – latencies, throughput numbers, query analysis`
	`144`	+ metrics (those from `pg_stat_statements`, wait event analysis), and so on.
	`145`	`+5. Develop a good format of summary and follow this format. It can include short description of various input parts,`
	`146`	`+ including workload delta, and main comparison metrics. Store all the summaries for all runs in this well-structured`
	`147`	`+ form.`

`‎files/0013_db_benchmark.png`

344 KB

Loading[フレーム]

0 commit comments

Comments

(0)

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit f00f66e

File tree

2 files changed

2 files changed

`‎0013_how_to_benchmark.md`

`‎files/0013_db_benchmark.png`

0 commit comments