-
Notifications
You must be signed in to change notification settings - Fork 209
fix(copy_data): add retries for copy_data command#916
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@levkk could you please check this. Did I get the idea right about the original intention?
This comment has been minimized.
This comment has been minimized.
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should do this for two reasons:
COPYis atomic and transactional: if it fails, none of the rows will be saved in the table; it will be empty when we retryTRUNCATEis a scary command to run for now. We should add a bunch more tests and conditions that prevent this from accidentally being called on the source DB. For now, let's have the user truncate manually if this retry logic fails for some reason.
There is a chance for a race condition where the table copy completes and we get an error somewhere below, e.g., while running COMMIT, but the odds of that are slim. We should definitely account for this (and truncate), but after we implement a few "this is definitely the destination" checks. I'll write up a separate issue for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, dropped the truncate, left the comment for future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
send_one is actually not free! The ParallelConnection gave us I think a 30-40% copy speed boost. Have you benchmarked this? Also curious what made you refactor this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved to separate pr to verify and test it #920
This reverts commit 5560276.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We test this code somewhere in CI right? Just double checking. This is the replica/primary promoter, so we need to be sure its tested before tweaking it. At the very least, test it locally (with role = "auto") making sure it still works. I think this change is fine, although probably a no-op since the code below won't take any action if data provided by lsn_check_interval-controlled loop isn't provided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a bad idea to track retries, but can add that as a follow-up.
levkk
commented
Apr 22, 2026
The mirror test is flaky. The rest looks good to me!
This comment has been minimized.
This comment has been minimized.
@levkk
levkk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be good to go!
Uh oh!
There was an error while loading. Please reload this page.
fixes #897