3

My HD ran out of space while a postgres was running. I had to kill postgres and freed some space. Now I am not able to restart postgres with the following error message:

LOG: listening on IPv4 address "0.0.0.0", port 5434
LOG: listening on IPv6 address "::", port 5434
LOG: listening on Unix socket "/tmp/.s.PGSQL.5434"
LOG: database system was interrupted; last known up at 2018年04月16日 05:20:46 EDT
PANIC: could not read file "pg_logical/replorigin_checkpoint": Success
LOG: startup process (PID 97490) was terminated by signal 6: Aborted
LOG: aborting startup due to startup process failure
LOG: database system is shut down

In data/pg_logical there exists a file with the name replorigin_checkpoint but it is empty.

I already copied the data directory for backup reasons as suggested here but I am actually not really sure what to do next. amcheck looks like, it is only working on a running postgres.

./postgres -V postgres (PostgreSQL) 11devel

Ubuntu 16.04.4 LTS

Maybe it is worth noting, that the file system is itself mounted on that machine (nfs)

Questions
1. What is supposed to be inside replorigin_checkpoint?
2. Is there the possibility to restart from an earlier checkpoint?
3. What type of corruption repairings are out ther (not only corruption detection)

asked Apr 16, 2018 at 14:29

3 Answers 3

2

Maybe just try to rename the file? If it's empty anyway, there should not be any loss of data.

answered Apr 17, 2018 at 8:43
1

If pg_logical is the culprit here, can you try to start your server without activating it ? You should change some parameters in your postgresql.conf file, such as wal_level and shared_preload_libraries. If it works, you could try to enable pg_logical after. To know what's inside the replorigin_checkpoint file, you can have a look at its github.

Are the file rights ok on this file and can you see what's inside with the postgres user ? I'm also not sure pg_logical is sure to work with the 11devel upcoming version.

You might also be able to save yourself copious amount of pains by not using NFS. Many PostgreSQL DBA report hard to understand and repair failures. You should easily find litterature on this subject.

answered Apr 17, 2018 at 7:27
1

If you don't use logical replication, then you can just remove the file and restart. If you do use logical replication, you can try that but I don't know how well it will work--I would at least worry that the subscriptions have gotten out of sync and lost the information necessary to put back in sync, so would need to be rebuilt.

This shouldn't happen, of course. Not even after running the disk out of space. I don't know if it is bug in logical replication, or if it is because you are using NFS, which is not recommended for PostgreSQL and has a reputation for corrupting your data. The checkpoint file is written to a temporary file in the same directory, then renamed into place. This method should not be able to result in a zero length file on most file systems (if the write fails, the rename is not attempted), but I don't know about over NFS.

answered Apr 17, 2018 at 14:40

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.