The checkpointer shouldn't ignore its latch. Other backends may be
waiting for it to drain the request queue. Hopefully real systems don't
have a full queue often, but the condition is reached easily when
shared_buffers is small.
This involves defining a new wait event, which will appear in the
pg_stat_activity view often due to spread checkpoints.
Back-patch only to 14. Even though the problem exists in earlier
branches too, it's hard to hit there. In 14 we stopped using signal
handlers for latches on Linux, *BSD and macOS, which were previously
hiding this problem by interrupting the sleep (though not reliably, as
the signal could arrive before the sleep begins; precisely the problem
latches address).
Reported-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/
20220226213942.nb7uvb2pamyu26dj%40alap3.anarazel.de
index 9fb62fec8ec5822d6fc3f3669394369d7bbd902d..8620aaddc79e68b3926f08b9ee50d6e0598cff1d 100644 (file)
@@ -2235,6 +2235,10 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
<entry><literal>BaseBackupThrottle</literal></entry>
<entry>Waiting during base backup when throttling activity.</entry>
</row>
+ <row>
+ <entry><literal>CheckpointerWriteDelay</literal></entry>
+ <entry>Waiting between writes while performing a checkpoint.</entry>
+ </row>
<row>
<entry><literal>PgSleep</literal></entry>
<entry>Waiting due to a call to <function>pg_sleep</function> or
index 4488e3a44357682a44928bb0bcbb5471ceb5a857..a59c3cf02015267df7f6bdcac18156dcdf943db2 100644 (file)
}
ckpt_active = false;
+
+ /* We may have received an interrupt during the checkpoint. */
+ HandleCheckpointerInterrupts();
}
/* Check for archive_timeout and switch xlog files if necessary. */
@@ -726,7 +729,10 @@ CheckpointWriteDelay(int flags, double progress)
* Checkpointer and bgwriter are no longer related so take the Big
* Sleep.
*/
- pg_usleep(100000L);
+ WaitLatch(MyLatch, WL_LATCH_SET | WL_EXIT_ON_PM_DEATH | WL_TIMEOUT,
+ 100,
+ WAIT_EVENT_CHECKPOINT_WRITE_DELAY);
+ ResetLatch(MyLatch);
}
else if (--absorb_counter <= 0)
{
index 60972c3a750babfa7fff445cf4e730b432674f52..0706e922b5369686c47de3cab9f3af0f70e6d674 100644 (file)
@@ -485,6 +485,9 @@ pgstat_get_wait_timeout(WaitEventTimeout w)
case WAIT_EVENT_BASE_BACKUP_THROTTLE:
event_name = "BaseBackupThrottle";
break;
+ case WAIT_EVENT_CHECKPOINT_WRITE_DELAY:
+ event_name = "CheckpointWriteDelay";
+ break;
case WAIT_EVENT_PG_SLEEP:
event_name = "PgSleep";
break;
index 395d325c5fe769e71d766f1db658ba9d794d8c59..d0345c6b49e85f60b3505b55f63752230e0de4e2 100644 (file)
typedef enum
{
WAIT_EVENT_BASE_BACKUP_THROTTLE = PG_WAIT_TIMEOUT,
+ WAIT_EVENT_CHECKPOINT_WRITE_DELAY,
WAIT_EVENT_PG_SLEEP,
WAIT_EVENT_RECOVERY_APPLY_DELAY,
WAIT_EVENT_RECOVERY_RETRIEVE_RETRY_INTERVAL,