According to user comments PostgreSQL data checksums have very minimal runtime overhead (both CPU and storage) but would allow (among other things) using pg_rewind
for point in time recovery (PITR). However, data checksums are not enabled by default and enabling it on already existing HA cluster is not possible without pretty significant downtime. (If I've understood correctly, you cannot enable checksums on hot-standby only and promote it as new master once enabling the checksums were complete on the hot-standby.)
Is there some poorly known issues if data checksums were enabled by default? Or is the default state (checksums disabled) just due historical reasons even though enabling data checksums would make much more sense in all cases?
-
Interesting question (+1) - I'd say that it's something like "enable as little as possible by default - let your users decide" - i.e. make the system as lean as possible by default - BTW, this is a complete guess on my part!Vérace– Vérace2023年06月08日 20:35:30 +00:00Commented Jun 8, 2023 at 20:35
2 Answers 2
Enabling data checksums is not for free:
you have to calculate the checksums frequently, which costs CPU
data checksums require that hint bits are WAL logged, which increases the amount of WAL that needs to be written
Also, data checksums don't offer any benefit unless you are using shoddy storage, where data could change between the time you wrote a block and the time you read it again.
If you want to use pg_rewind
, you don't need data checksums. It is sufficient to enable the parameter wal_log_hints
.
-
How much would enabling
wal_log_hints
typically increases the WAL log sizes? Is it closer to +1% or +100%?Mikko Rantalainen– Mikko Rantalainen2023年06月09日 14:19:16 +00:00Commented Jun 9, 2023 at 14:19 -
1The old answer: it depends. Could be virtually zero, but if you load the data, then have a checkpoint, then query the data, it could be 100%, since all the blocks are modified by the first reader, who sets the hint bits.Laurenz Albe– Laurenz Albe2023年06月09日 14:27:12 +00:00Commented Jun 9, 2023 at 14:27
I will quote some opinions from core postgresql developers
That's nice for us but I'm not sure that it's a benefit for users. I've seen little if any data to suggest that checksums actually catch enough problems to justify the extra CPU costs and the risk of false positives.
My problem is more that I'm not confident the checks are mature enough. The basebackup checks are atm not able to detect random data, and neither basebackup nor backend checks detect zeroed out files/file ranges.
I can believe that many users have shared_buffers set to its default value and that we are going to get complains about "performance drop after upgrade to v12" if we switch data checksums to on by default.
data checksums can catch not so many actual data corruptions, are not free (especially with small sizes of shared_buffers - checksums are calculated when writing a page to disk or when reading from disk to shared_buffers) neither by CPU nor by WAL size.
No issues per se, but the general consensus so far is that there's not much benefit for end users to have it enabled by default.
-
I would have expected
shared_buffers
to be increased for all cases where the performance actually matters because it's that critical. Maybe PostgreSQL should automatically adjustshared_buffers
by default instead of hardcoding it to really small default? For example, default could beauto
and it would be interpreted as 10% of total system RAM.Mikko Rantalainen– Mikko Rantalainen2023年06月09日 14:23:38 +00:00Commented Jun 9, 2023 at 14:23
Explore related questions
See similar questions with these tags.