I have data-pulling functionality that once in 5 seconds grabs all the data from Postgres table basing on modified_timestamp column. It works the following way:
SELECT * FROM my_table WHERE modified_timestamp > _some_persisted_timestamp
- _some_persisted_timestamp = CURRENT_TIMESTAMP
- Processing data received from step 1
- Sleep for 5s
- Go to step 1
Where modified_timestamp updated with trigger (after any row update modified_timestamp becomes CURRENT_TIMESTAMP
).
It worked fine until I noticed that CURRENT_TIMESTAMP
in Postgres is transaction start timestamp in fact and some of the updates are lost. Why are they lost? That's quite simple - at the moment when I execute query SELECT * FROM my_table WHERE modified_timestamp > _some_persisted_timestamp
some of the changes have already occurred, but modified_timestamp is before updated _some_persisted_timestamp because transaction is in progress still.
This problem could be easily solved by assigning in step 2 timestamp when update becomes visible for other transactions (transaction commit timestamp in other words) instead of CURRENT_TIMESTAMP or clock_timestamp().
I read documentation, but have found nothing related to transaction commit timestamp. Could you kindly suggest smth?
Btw, I'm aware of logical decoding and I know that this mechanism suits better for my needs in theory, but there are certain practical problems not allowing me to use it.
-
You are linking to the Postgres 9.4 manual. Doesn't mean you are on 9.4, does it? (Would be a dealbreaker.)Erwin Brandstetter– Erwin Brandstetter2019年03月15日 18:49:20 +00:00Commented Mar 15, 2019 at 18:49
-
@ErwinBrandstetter I'm on 9.5.4 and can update if required version is available in Amazon RDSbsiamionau– bsiamionau2019年03月18日 11:48:01 +00:00Commented Mar 18, 2019 at 11:48
-
1Upgrade at least to the latest point release. 9.5.16 at the time of writing. See recommendations here.Erwin Brandstetter– Erwin Brandstetter2019年03月18日 13:46:36 +00:00Commented Mar 18, 2019 at 13:46
1 Answer 1
This problem could be easily solved by assigning in step 2 timestamp when update becomes visible for other transactions (transaction commit timestamp in other words) instead of CURRENT_TIMESTAMP or clock_timestamp().
This is logically impossible. Postgres writes new row versions before it finally commits to make them visible. It would require prophetic capabilities to write a future timestamp yet unknown at the time of writing.
However, you can get commit timestamps from a different source: since Postgres 9.5, there is a GUC setting track_commit_timestamp
to start logging commit timestamps globally.
Then you can get commit timestamps with the utility function pg_xact_commit_timestamp(xid)
. Your query could look like:
SELECT * FROM my_table t
WHERE pg_xact_commit_timestamp(t.xmin) > _some_persisted_timestamp;
Be aware that commit timestamps are not kept around forever. After two billion transactions transactions (2^31), transaction IDs are "frozen". That does not delete it right away, but after 4 billion transactions, the information is gone for certain. That's a big number of transactions, and only very busy databases burn that much over a lifetime. But there can be programming errors burning through transaction numbers more quickly than expected ...
Your step 2 and step 3 trade positions, and you record the commit timestamp instead of CURRENT_TIMESTAMP
- or xmin
from any freshly updated row to derive the commit timestamp with pg_xact_commit_timestamp()
once more.
More:
- How do I write a Postgres SQL command based on metadata of the tables themselves?
- https://wiki.postgresql.org/wiki/What%27s_new_in_PostgreSQL_9.5#Commit_timestamp_tracking
About xmin
:
But I am not completely sure I understand your task. Maybe you need a queuing tool or process rows one by one like discussed here:
-
Thank you so much for detailed response! To understand more about my task: docs.confluent.io/current/connect/kafka-connect-jdbc/… Problem occurs while using timestamp mode - connector works exactly the same way as I described in question. And it loses updates. Solution you suggested looks awesome - need to test it first before accepting as right answer. Thanks again!bsiamionau– bsiamionau2019年03月18日 12:09:12 +00:00Commented Mar 18, 2019 at 12:09
-
1Somewhat related stackoverflow.com/a/56961372/5320906snakecharmerb– snakecharmerb2020年12月16日 18:02:42 +00:00Commented Dec 16, 2020 at 18:02
-
Is
t.xmin
only generated whentrack_commit_timestamp
is enabled?RonJohn– RonJohn2023年06月26日 18:43:00 +00:00Commented Jun 26, 2023 at 18:43 -
@RonJohn: No,
xmin
is a system column that's part of the tuple header and always there for any regular, unlogged, or temp table. But transaction timestamps are only recorded aftertrack_commit_timestamp
is turned on.Erwin Brandstetter– Erwin Brandstetter2023年06月26日 21:21:43 +00:00Commented Jun 26, 2023 at 21:21 -
Will
WHERE pg_xact_commit_timestamp(t.xmin) > 1ドル
need to do a full table scan or is there some sort of indexing in postgres that enables it only to look at the last inserted tuples?Bergi– Bergi2024年11月07日 16:50:28 +00:00Commented Nov 7, 2024 at 16:50