We have a geographically distributed database setup with seven servers. There are 62 master tables set up for logical replication with six subscribers.
This was working fine for a long time with PostgreSQL 15. Recently we migrated to PostgreSQL 17.5. Now the entire replication is failing often with an error message captured in all the servers' log files as
"ERROR: invalid memory alloc request size 1294438032"
The error is captured in the publisher first, and then all the subscribers go into catchup mode and never recover.
There are no network related issues and all the server configuration parameters are fine.
On a side note, all servers have streaming replication set up for one standby server each, and that is working quite fine.
In all the servers, memory utilization even at peak hours is less than 30%.
Google Gemini indicates this could be a PosgreSQL bug, but I can't find any related discussions online.
Can you point me in the right direction to solve this please?
I can share the values of any configuration parameters if it helps.
-
I had about 62 tables in my publication. I split them into two publications and it seems to have somehow solved the issue. But I am still interested in finding out the root cause of the error for academic purposes, and to be ready in case the error resurfaces. Any related knowledge you can share is welcome.Thadeus Anand– Thadeus Anand2025年06月18日 06:12:27 +00:00Commented Jun 18 at 6:12
-
The replication went on successfully for about two days and then the postgresql process itself was killed by the OOM killer, which I suspect could be due to the underlying issue of memory allocation. I am still searching for answers.Thadeus Anand– Thadeus Anand2025年06月19日 07:45:23 +00:00Commented Jun 19 at 7:45
-
Based on many discussions on PostgreSQL forums, it is clear that 17.5 is a bugged release, and for this issue, if possible, downgrading to 17.4 could be a solution. In my case, I will need to wait till they release 17.6 on August 14, 2025. Till then, I am performing my data replication tediously and programmatically.Thadeus Anand– Thadeus Anand2025年07月01日 05:51:29 +00:00Commented Jul 1 at 5:51
-
1After upgrading to 17.6, my memory allocation issue has gone away. My logical replication is working fine now, but the replication slots keep getting bigger in spite of it. But that is a different bug/issue altogether, so I will close this question.Thadeus Anand– Thadeus Anand2025年08月23日 10:45:52 +00:00Commented Aug 23 at 10:45
1 Answer 1
After upgrading to 17.6, my memory allocation issue has gone away. My logical replication is working fine now, but the replication slots keep getting bigger in spite of it. But that is a different bug/issue altogether, so I will close this question.