Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Realtime subscribe and reliable Change Data Capture #311

sampok started this conversation in General
Discussion options

Being able to reliably sync every PostgreSQL table insert, update, delete into another system (another datastore, Elasticsearch, etc) is a useful pattern for many services. Apparently the most reliable approach currently is to use Debezium and a Kafka cluster, which is great, but an overkill for some services. It's not that hard to develop a custom sync using a logical replication slot, wal2json, and sql queries with functions pg_logical_slot_peek_changes & pg_replication_slot_advance. However, large transactions will cause out of memory errors unless using the right wal2json settings, and cursor-based consumption of slot changes. Lsn advance logic is also complicated unless you really learn all concepts (such as what happens if you advance to a lsn in the middle of a transaction). And processing in batches is not as efficient as streaming.

The postgres library now supports subscribe, which seems great as it produces a stream of change events. However for a reliable sync process it seems it'd need a way to manually acknowledge processed entries. This is to allow the consumer to crash/restart safely without missing any updates. Now it seems great for realtime updates if you can afford to miss some. Is this a correct understanding of the current functionality?

Would it be feasible to improve the realtime subscribe to support reliable processing? Or does the subscription protocol effectively make this hard to accomplish?

You must be logged in to vote

Replies: 2 comments 1 reply

Comment options

@sampok That is really interesting!

You are correct that my idea for the use cases now is just having change events, and not worrying if some are missed.

I definitely think we could explore expanding the current functionality to allow defining names of permanent replication slots to connect to and the LSN. I think the code in subscribe.js is fairly straight forward, and you should be able to take that almost verbatim into your own setup to explore defining a permanent replication slot and LSN. I don't have the time to dig into that right now, but I would love to answer any questions you might have to help you along the way.

You must be logged in to vote
1 reply
Comment options

Thanks @porsager!

I have just completed a working, tested sync with logical replication sql functions and wal2json. It's not the most efficient, but it works, so I don't have an immediate need to rewrite it (unless some issues come up). However, if I have time, I could peek into it!

The first thing for me (or anyone else taking a stab at this) would be to understand how the server – client acks process and LSNs are supposed to work. I haven't found much documentation on it in otherwise awesome Postgres docs. The server is expecting only 'replay progress messages' that can define 3 different last WAL byte locations. Is the second one (flushed to disk) the one that advances the replication slot LSN on the server? Another thing that would be good to understand is what exactly LSNs mean. If I remember right, sometimes a DML message and transaction end might share the same LSN. And can LSN be advanced only to the beginning/end of each transaction?

Do you know of any good docs on the acks & LSN advacement?

Comment options

Hey @sampok ... Sorry for the late response.

Yes, the documentation for logical replication and the protocol is a bit hidden - don't know if you found this and its subsections

The LSN (Log Sequence Number) is a number representing a byte position in the WAL (Write Ahead Log)

The LSNs supplied in the Logical Replication Protocol will represent various boundaries in the WAL like BEGIN, COMMIT etc. The one Postgres.js uses is the last consistent_point returned when starting the temporary replication slot. After that it will reply on pong with the latest BEGIN lsn received.

I hope the docs pointers above can help you getting further :)

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants

AltStyle によって変換されたページ (->オリジナル) /