Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit f00f66e

Browse files
committed
Day 13
1 parent 3fecb7d commit f00f66e

File tree

2 files changed

+147
-0
lines changed

2 files changed

+147
-0
lines changed

‎0013_how_to_benchmark.md

Lines changed: 147 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,147 @@
1+
Originally from: [tweet](https://twitter.com/samokhvalov/status/1711237471743480060), [LinkedIn post](...).
2+
3+
---
4+
5+
# How to benchmark
6+
7+
> I post a new PostgreSQL "howto" article every day. Join me in this
8+
> journey – [subscribe](https://twitter.com/samokhvalov/), provide feedback, share!
9+
10+
Benchmarking is a huge topic. Here, we cover minimal steps that are required for a Postgres benchmark to be informative,
11+
useful, and correct.
12+
13+
In this article, we assume that the following principles are followed:
14+
15+
1. **NOT A SHARED ENV:** The whole machine is under our solely use (nobody else is using it), we aim to study the
16+
behavior
17+
of Postgres as a whole, with all its components (vs. microbenchmarks such as studying a particular query via using
18+
`EXPLAIN` or focusing on underlying components such as disk and filesystem performance).
19+
20+
2. **HIGH QUALITY:** We aim to be honest (no "benchmarketing" goals), transparent (all details are shared), precise, and
21+
fix all mistakes when/if they happen. When it makes sense, each benchmark run should be long enough to take into
22+
account factors like colder state of caches or various fluctuations. There should be also multiple runs of each type
23+
of benchmark (usually, at least 4-5), to ensure that runs are reproducible. It makes sense to learn from existing
24+
experience: excellent [Brendan Gregg's book "System
25+
Performance"](https://brendangregg.com/blog/2020-07-15/systems-performance-2nd-edition.html) has a chapter about
26+
benchmarks; it may also be useful to learn from other fields (physics, etc.), to understand the principles and
27+
methodologies of [successful experiments](https://en.wikipedia.org/wiki/Experiment).
28+
29+
3. **READY TO LEARN:** We have enough expertise to understand the bottlenecks and limitations or ready to use other
30+
people's help, re-making benchmarks if needed.
31+
32+
## Benchmark structure
33+
34+
Benchmark is a kind of database experiment, where, in general case, we use multiple sessions to DBMS and study the
35+
behavior of the system as a whole, and it's all or particular components (e.g., buffer pool, checkpointer, replication).
36+
37+
Each benchmark run should have a well-defined structure. In general, it contains two big parts:
38+
39+
1. **INPUT:** everything we have or define before conducting the database – where we run the benchmark, how the system
40+
was
41+
configured, what DB and workload we use, what change we aim to study (to compare the behavior before and after the
42+
change).
43+
2. **OUTPUT:** various observability data such as logs, errors observed, statistics, etc.
44+
45+
![Database Benchmark](files/0013_db_benchmark.png)
46+
47+
Each part should be well-documented so anyone can reproduce the experiment, understand the main metrics (latency,
48+
throughput, etc.), understand bottlenecks, conduct additional, modified experiments if needed.
49+
50+
The description of all aspects of database benchmarking could take a whole book – here I provide only basic
51+
recommendations that can help you avoid mistakes and improve the general quality of your benchmarks.
52+
53+
Of course, some of the things can be omitted, if needed. But in general case, it is recommended to automate
54+
documentation and artifact collection for all experiments, so it would be easy to study the details later. You can
55+
find [here](https://gitlab.com/postgres-ai/postgresql-consulting/tests-and-benchmarks/-/issues)
56+
some good examples of benchmarks performed for specific purposes (e.g., to study pathological subtransaction behavior or
57+
to measure the benefits of enabling `wal_compression`).
58+
59+
## INPUT: Environment
60+
61+
- Make sure you're using machines of a proper size, don't run on laptop (unless absolutely necessary). AWS Spot or GCP
62+
Preemptible instances, used during short periods of time, are extremely affordable and super helpful for
63+
experimenting. For example, spot instances for VMs with 128 Intel or AMD vCPUs and 256-1024 GiB of RAM have hourly
64+
price as low as 5ドル-10 ([good comparison tool](https://instances.vantage.sh)), billed by second – this enables very
65+
cost-efficient experiments on large machines.
66+
- When you get VMs, quickly check them with microbenchmark tools such as fio, sysbench to ensure that CPU, RAM, disk,
67+
network work as advertised. If in doubts, compare to other same-type VMs and choose.
68+
- Document VM tech spec, disk, and filesystem (these two are extremely important for databases!), OS choices, and
69+
non-default settings.
70+
- Document Postgres version used and additional extensions, if any (some of them can have "observer effect" even if they
71+
are just installed).
72+
- Document non-default Postgres settings used.
73+
74+
## INPUT: Database
75+
76+
- Document what schema and data you use. Ideally, in a fully reproducible form (SQL / dump).
77+
78+
## INPUT: Workload
79+
80+
- Document all the aspects of the workload you've used – ideally, in a fully reproducible form (SQL, pgbench, sysbench,
81+
etc. details).
82+
- Understand the type of your benchmark, the kind of load testing you're aiming to have: is it edge-case load testing
83+
(stress testing) when you aim to go "full speed" or regular load testing, in which you try to simulate real-life
84+
situation when, for example, CPU usage is normally far below 100%. Note that by default, pgbench tends to give you
85+
"stress testing" (not limiting the number of TPS – to limit it, use option `-R`).
86+
87+
## INPUT: Delta
88+
89+
There may be various types of "deltas" (the subject of our study that define the difference between runs). Here are just
90+
some examples:
91+
92+
- different Postgres minor or major versions
93+
- different OS versions
94+
- different settings such as `work_mem`
95+
- different hardware
96+
- varying scale: different number of clients working with database or different table sizes
97+
- different filesystems
98+
99+
It is not recommended to consider schema changes of changes in SQL queries as "delta" because:
100+
101+
- such workload changes usually happen at a very high pace
102+
- full-fledged benchmarking is very expensive
103+
- it is possible to study schema and query changes in shared environments, focusing on IO metrics (BUFFERS!), achieving
104+
high level of time and cost efficiency (see [@Database_Lab](https://twitter.com/Database_Lab))
105+
106+
## OUTPUT: collect artifacts
107+
108+
It is worth collecting a lot of various artifacts and making sure they will not be lost (e.g., upload them to an object
109+
storage).
110+
111+
- Before each run, reset all statistics, using `pg_stat_reset()`, `pg_stat_reset_shared(..)`, other standard
112+
`pg_stat_reset_***()` functions ([docs](https://postgresql.org/docs/current/monitoring-stats.html)),
113+
`pg_stat_statements_reset()`, `pg_stat_kcache_reset()`, and so on.
114+
- After each run, dump all `pg_stat_***` views in CSV format.
115+
- Collect all Postgres logs and any other related logs (e.g., pgBouncer's, Patroni's, syslog).
116+
- While Postgres, pgBouncer or any other configs are "input", it makes sense to create a snapshot of all actual observed
117+
configuration values (e.g., `select * from pg_settings;`) and consider this data as artifacts as well.
118+
- Collect the query analysis information: snapshot of `pg_stat_statements`, `pg_stat_kcache`, `pg_wait_sampling`
119+
/ [pgsentinel](https://github.com/pgsentinel/pgsentinel), etc.
120+
- Extract all information about errors from (a) logs, (b) `pg_stat_database` (`xact_rollback`) and similar places via
121+
SQL,
122+
and consider this a separate, important type of artifact for analysis. Consider using a small extension called
123+
[logerrors](https://github.com/munakoiso/logerrors) that will register all error codes and expose them via SQL.
124+
- If monitoring is used, collect charts from there. For experiments particularly, it may be convenient to use
125+
[Netdata](https://netdata.cloud) since it's really easy to install on a fresh machine, and it has dashboard
126+
export/import functions (unfortunately, they are client-side, hence manual actions are always needed; but, I
127+
personally find them very convenient when conducting DB experiments).
128+
129+
## Analysis
130+
131+
Some tips (far from being complete):
132+
133+
1. Always check errors. It's not uncommon to have a benchmark run, jump into some conclusion, and only later realize
134+
that the error count was too high, making the run not useful.
135+
2. Understand where the bottleneck is. Very often, we are saturated, say, on disk IO, and think we observe behavior of
136+
our database system, but we actually observe the behavior of, say, cloud disk throttling or filesystem limitations
137+
instead. In such cases we need to think how to tune our input to avoid such bottlenecks, to perform useful
138+
experiments.
139+
3. In some cases, it is, vice versa, very desired to reach some kind saturation – for example, if we study the speed of
140+
`pg_dump` or `pg_restore`, we may want to observe our disk system saturated, and we tune the input (e.g. how exactly
141+
we `pg_dump` – how many parallel workers we use, is compression involved, is network involved, etc.) so the desired
142+
saturation is indeed reached, and we can demonstrate it.
143+
4. Understand the main metrics you're going to compare between runs – latencies, throughput numbers, query analysis
144+
metrics (those from `pg_stat_statements`, wait event analysis), and so on.
145+
5. Develop a good format of summary and follow this format. It can include short description of various input parts,
146+
including workload delta, and main comparison metrics. Store all the summaries for all runs in this well-structured
147+
form.

‎files/0013_db_benchmark.png

344 KB
Loading[フレーム]

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /