9

I found answers to this problem suggesting that it might take a while to restart postgres. But now we are already waiting for over 2 hours which seems extremely long compared to the 5 minutes it usually takes. (https://stackoverflow.com/questions/54922433/postgresql-fatal-the-database-system-is-starting-up-windows-10,
https://stackoverflow.com/questions/54922433/postgresql-fatal-the-database-system-is-starting-up-windows-10 )

There are no other logs than FATAL: the database system is starting up. Furthermore, the CPU and RAM utilization is minimal. It looks like nothing is happening.

We also checked disk space which is no problem.

How long should we wait until trying something else? And do you have any idea what could have caused this and how to fix it?

Some additional information:

  • Postgres version is 13.8
  • the size of the database is something in the range of 1.5TB
  • if our monitoring works correctly the last checkpoint was around 20 minutes before the shutdown
  • wal settings:
max_wal_size = 8GB
min_wal_size = 512MB
checkpoint_completion_target = 0.9

After starting postgres with the -d 3 flag we got the following logs:

 2022年12月02日 15:17:10.471 UTC [1] LOG: starting PostgreSQL 13.8 (Ubuntu 13.8-1.pgdg22.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.2.0-19ubuntu1) 11.2.0, 64-bit
 2022年12月02日 15:17:10.472 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
 2022年12月02日 15:17:10.472 UTC [1] LOG: listening on IPv6 address "::", port 5432
 2022年12月02日 15:17:10.480 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
 2022年12月02日 15:17:10.683 UTC [14] LOG: database system was interrupted; last known up at 2022年12月02日 10:01:04 UTC
 2022年12月02日 15:17:10.683 UTC [14] DEBUG: removing all temporary WAL segments
 2022年12月02日 15:17:10.914 UTC [1] DEBUG: forked new backend, pid=15 socket=9
 2022年12月02日 15:17:10.915 UTC [15] LOG: connection received: host=10.255.0.181 port=46576
 2022年12月02日 15:17:10.918 UTC [15] FATAL: the database system is starting up
 2022年12月02日 15:17:10.918 UTC [15] DEBUG: shmem_exit(1): 0 before_shmem_exit callbacks to make
 2022年12月02日 15:17:10.918 UTC [15] DEBUG: shmem_exit(1): 0 on_shmem_exit callbacks to make
 2022年12月02日 15:17:10.918 UTC [15] DEBUG: proc_exit(1): 1 callbacks to make
 2022年12月02日 15:17:10.918 UTC [15] DEBUG: exit(1)
 2022年12月02日 15:17:10.918 UTC [15] DEBUG: shmem_exit(-1): 0 before_shmem_exit callbacks to make
 2022年12月02日 15:17:10.918 UTC [15] DEBUG: shmem_exit(-1): 0 on_shmem_exit callbacks to make
 2022年12月02日 15:17:10.918 UTC [15] DEBUG: proc_exit(-1): 0 callbacks to make
 2022年12月02日 15:17:10.920 UTC [1] DEBUG: server process (PID 15) exited with exit code 1
 2022年12月02日 15:17:11.157 UTC [1] DEBUG: forked new backend, pid=16 socket=9
 2022年12月02日 15:17:11.157 UTC [16] LOG: connection received: host=10.255.0.181 port=56142
 2022年12月02日 15:17:11.158 UTC [16] FATAL: the database system is starting up
 2022年12月02日 15:17:11.158 UTC [16] DEBUG: shmem_exit(1): 0 before_shmem_exit callbacks to make
 2022年12月02日 15:17:11.158 UTC [16] DEBUG: shmem_exit(1): 0 on_shmem_exit callbacks to make
 2022年12月02日 15:17:11.158 UTC [16] DEBUG: proc_exit(1): 1 callbacks to make
 2022年12月02日 15:17:11.158 UTC [16] DEBUG: exit(1)
 2022年12月02日 15:17:11.158 UTC [16] DEBUG: shmem_exit(-1): 0 before_shmem_exit callbacks to make
 2022年12月02日 15:17:11.158 UTC [16] DEBUG: shmem_exit(-1): 0 on_shmem_exit callbacks to make
 2022年12月02日 15:17:11.158 UTC [16] DEBUG: proc_exit(-1): 0 callbacks to make
 2022年12月02日 15:17:11.160 UTC [1] DEBUG: server process (PID 16) exited with exit code 1

This repeats indefinitely.

Using strace as Laurenz suggested gave us this:

[02.12.2022, 5:25:50,463 PM] openat(AT_FDCWD, "/usr/share/locale/C.UTF-8/LC_MESSAGES/pg_ctl-13.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
[02.12.2022, 5:25:50,463 PM] openat(AT_FDCWD, "/usr/share/locale/C.utf8/LC_MESSAGES/pg_ctl-13.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
[02.12.2022, 5:25:50,463 PM] openat(AT_FDCWD, "/usr/share/locale/C/LC_MESSAGES/pg_ctl-13.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
[02.12.2022, 5:25:50,463 PM] openat(AT_FDCWD, "/usr/share/locale-langpack/C.UTF-8/LC_MESSAGES/pg_ctl-13.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
[02.12.2022, 5:25:50,463 PM] openat(AT_FDCWD, "/usr/share/locale-langpack/C.utf8/LC_MESSAGES/pg_ctl-13.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
[02.12.2022, 5:25:50,463 PM] openat(AT_FDCWD, "/usr/share/locale-langpack/C/LC_MESSAGES/pg_ctl-13.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
[02.12.2022, 5:25:50,464 PM] newfstatat(1, "", {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0), ...}, AT_EMPTY_PATH) = 0
[02.12.2022, 5:25:50,465 PM] write(1, "waiting for server to start...", 30waiting for server to start...) = 30
[02.12.2022, 5:25:50,466 PM] openat(AT_FDCWD, "/home/postgres/pgdata/data/postmaster.pid", O_RDONLY) = -1 ENOENT (No such file or directory)
[02.12.2022, 5:25:50,466 PM] wait4(63, 0x7ffcc7ccd680, WNOHANG, NULL) = 0
[02.12.2022, 5:25:50,466 PM] write(1, ".", 1.) = 1
[02.12.2022, 5:25:50,466 PM] pselect6(0, NULL, NULL, NULL, {tv_sec=0, tv_nsec=100000000}, NULL) = 0 (Timeout)
[02.12.2022, 5:25:50,515 PM] openat(AT_FDCWD, "/home/postgres/pgdata/data/postmaster.pid", O_RDONLY) = 3
[02.12.2022, 5:25:50,515 PM] newfstatat(3, "", {st_mode=S_IFREG|0600, st_size=68, ...}, AT_EMPTY_PATH) = 0
[02.12.2022, 5:25:50,516 PM] read(3, "63\n/home/postgres/pgdata/data\n16"..., 69) = 68
[02.12.2022, 5:25:50,517 PM] close(3) = 0
[02.12.2022, 5:25:50,518 PM] wait4(63, 0x7ffcc7ccd680, WNOHANG, NULL) = 0
[02.12.2022, 5:25:50,519 PM] pselect6(0, NULL, NULL, NULL, {tv_sec=0, tv_nsec=100000000}, NULL2022-12-02 16:25:50.604 UTC [63] LOG: starting PostgreSQL 13.8 (Ubuntu 13.8-1.pgdg22.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.2.0-19ubuntu1) 11.2.0, 64-bit
[02.12.2022, 5:25:50,540 PM] 2022年12月02日 16:25:50.612 UTC [63] LOG: listening on IPv4 address "0.0.0.0", port 5432
[02.12.2022, 5:25:50,540 PM] 2022年12月02日 16:25:50.612 UTC [63] LOG: listening on IPv6 address "::", port 5432
[02.12.2022, 5:25:50,546 PM] 2022年12月02日 16:25:50.617 UTC [63] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
[02.12.2022, 5:25:50,619 PM] ) = 0 (Timeout)
[02.12.2022, 5:25:50,619 PM] openat(AT_FDCWD, "/home/postgres/pgdata/data/postmaster.pid", O_RDONLY) = 3
[02.12.2022, 5:25:50,619 PM] newfstatat(3, "", {st_mode=S_IFREG|0600, st_size=88, ...}, AT_EMPTY_PATH) = 0
[02.12.2022, 5:25:50,620 PM] read(3, "63\n/home/postgres/pgdata/data\n16"..., 89) = 88
[02.12.2022, 5:25:50,621 PM] close(3) = 0
[02.12.2022, 5:25:50,621 PM] wait4(63, 0x7ffcc7ccd680, WNOHANG, NULL) = 0
[02.12.2022, 5:25:50,621 PM] pselect6(0, NULL, NULL, NULL, {tv_sec=0, tv_nsec=100000000}, NULL) = 0 (Timeout)
[02.12.2022, 5:25:50,716 PM] openat(AT_FDCWD, "/home/postgres/pgdata/data/postmaster.pid", O_RDONLY) = 3
[02.12.2022, 5:25:50,716 PM] newfstatat(3, "", {st_mode=S_IFREG|0600, st_size=88, ...}, AT_EMPTY_PATH) = 0
[02.12.2022, 5:25:50,716 PM] read(3, "63\n/home/postgres/pgdata/data\n16"..., 89) = 88
[02.12.2022, 5:25:50,716 PM] close(3) = 0
[02.12.2022, 5:25:50,717 PM] wait4(63, 0x7ffcc7ccd680, WNOHANG, NULL) = 0
[02.12.2022, 5:25:50,717 PM] pselect6(0, NULL, NULL, NULL, {tv_sec=0, tv_nsec=100000000}, NULL) = 0 (Timeout)
[02.12.2022, 5:25:50,817 PM] openat(AT_FDCWD, "/home/postgres/pgdata/data/postmaster.pid", O_RDONLY) = 3
[02.12.2022, 5:25:50,819 PM] newfstatat(3, "", {st_mode=S_IFREG|0600, st_size=88, ...}, AT_EMPTY_PATH) = 0
[02.12.2022, 5:25:50,820 PM] read(3, "63\n/home/postgres/pgdata/data\n16"..., 89) = 88
[02.12.2022, 5:25:50,824 PM] close(3) = 0
[02.12.2022, 5:25:50,824 PM] wait4(63, 0x7ffcc7ccd680, WNOHANG, NULL) = 0
[02.12.2022, 5:25:50,824 PM] pselect6(0, NULL, NULL, NULL, {tv_sec=0, tv_nsec=100000000}, NULL) = 0 (Timeout)
[02.12.2022, 5:25:50,918 PM] openat(AT_FDCWD, "/home/postgres/pgdata/data/postmaster.pid", O_RDONLY) = 3
[02.12.2022, 5:25:50,919 PM] newfstatat(3, "", {st_mode=S_IFREG|0600, st_size=88, ...}, AT_EMPTY_PATH) = 0
[02.12.2022, 5:25:50,919 PM] read(3, "63\n/home/postgres/pgdata/data\n16"..., 89) = 88
[02.12.2022, 5:25:50,919 PM] close(3) = 0
[02.12.2022, 5:25:50,919 PM] wait4(63, 0x7ffcc7ccd680, WNOHANG, NULL) = 0
[02.12.2022, 5:25:50,919 PM] pselect6(0, NULL, NULL, NULL, {tv_sec=0, tv_nsec=100000000}, NULL) = 0 (Timeout)
[02.12.2022, 5:25:51,019 PM] openat(AT_FDCWD, "/home/postgres/pgdata/data/postmaster.pid", O_RDONLY) = 3
[02.12.2022, 5:25:51,019 PM] newfstatat(3, "", {st_mode=S_IFREG|0600, st_size=88, ...}, AT_EMPTY_PATH) = 0
[02.12.2022, 5:25:51,019 PM] read(3, "63\n/home/postgres/pgdata/data\n16"..., 89) = 88
[02.12.2022, 5:25:51,020 PM] close(3) = 0
[02.12.2022, 5:25:51,020 PM] wait4(63, 0x7ffcc7ccd680, WNOHANG, NULL) = 0
[02.12.2022, 5:25:51,020 PM] pselect6(0, NULL, NULL, NULL, {tv_sec=0, tv_nsec=100000000}, NULL) = 0 (Timeout)
[02.12.2022, 5:25:51,120 PM] openat(AT_FDCWD, "/home/postgres/pgdata/data/postmaster.pid", O_RDONLY) = 3
[02.12.2022, 5:25:51,120 PM] newfstatat(3, "", {st_mode=S_IFREG|0600, st_size=88, ...}, AT_EMPTY_PATH) = 0
[02.12.2022, 5:25:51,120 PM] read(3, "63\n/home/postgres/pgdata/data\n16"..., 89) = 88

So this is repeating

[02.12.2022, 5:25:50,919 PM] pselect6(0, NULL, NULL, NULL, {tv_sec=0, tv_nsec=100000000}, NULL) = 0 (Timeout)
[02.12.2022, 5:25:51,019 PM] openat(AT_FDCWD, "/home/postgres/pgdata/data/postmaster.pid", O_RDONLY) = 3
[02.12.2022, 5:25:51,019 PM] newfstatat(3, "", {st_mode=S_IFREG|0600, st_size=88, ...}, AT_EMPTY_PATH) = 0
[02.12.2022, 5:25:51,019 PM] read(3, "63\n/home/postgres/pgdata/data\n16"..., 89) = 88
[02.12.2022, 5:25:51,020 PM] close(3) 
asked Dec 2, 2022 at 13:34
7
  • Seems fishy. What do you get in the log file? Commented Dec 2, 2022 at 15:15
  • I added logs to the question Commented Dec 2, 2022 at 15:24
  • How exactly do you start postgres? pg_ctl? Please show the full command. Commented Dec 2, 2022 at 15:26
  • We are using this docker image hub.docker.com/layers/timescale/timescaledb-ha/… which uses this command pg_ctl -D "$PGDATA" -o "$(printf '%q ' "$@")" -w start Commented Dec 2, 2022 at 15:36
  • I would run an strace on the startup process and see what it is doing. Also, taking a stack trace of the startup process would be useful. Commented Dec 2, 2022 at 16:07

2 Answers 2

5

The database is running again. We are not entirely sure why but that is what we did:

We figured out that while Postgres was starting we got a lot of

2022年12月02日 15:17:10.918 UTC [15] FATAL: the database system is starting up

which were error messages responding to a lot of client requests trying to reach the database. We started the database on another port so that it can start without all the clients trying to reach it. For some reason this fixed it (at least we think so). The database started normally and after shutting it down again and restarting it on the original port it is back in operation.

A scary experience ...

answered Dec 2, 2022 at 17:26
2
  • Each message with "the database is starting up" is new time postgres tried to start again. So if you get more than one of those you're probably in a loop and there's no point waiting any longer to try something new. Commented Dec 2, 2022 at 23:01
  • 1
    @JoelCoehoorn this is actually logged any time a new connection to the database is made. That's what postgres tells the client, and then it aborts the connection. Commented Dec 2, 2022 at 23:40
-1

hot_standby is set to off It's a normal behaviour in this case when hot_standby is set to on, you can log in and the database is no longer "starting up"

answered Sep 4, 2023 at 11:36

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.