2

Since I am working in a multi-instance microservice environment, I came accross a problem with making some operations being performed at most by one of the running instances at once. The solution which we used for this up to now was to have a mutex lock stored in database, that would prevent other instances from doing the job when lock is acquired.

In most cases it was enough for us, but the problem with databse locks is that we need to assume some timeout in case when the lock issuer dies before releasing it. This seems to be working well if the operation can be finished within the timeout, but in case we don't know how long the process will take, this causes the lock to be released too early.

I was thinking of adding some background job to each of the microservice instances, that would additionally monitor the locks acquired by the instance in-memory and execute the DB query to extend them every X seconds. Any problem occured while extending the lock (e.g. some edge case in which other instance was still able to acquire the lock) would make the long-running process be aborted.

Has anyone tried such approach? Was it successful?

asked Jun 13, 2021 at 9:38
2
  • 3
    It seems you're creating a system with an extensible lease, which is a common pattern in distributed systems. It can be a very good design! Alternatively, your problem can also be viewed as a election problem where only the elected instance performs the task. I think it would be good to clarify for yourself if mutual exclusion is more important (no instance can perform the task while another might still be working on it), or progress (there might be duplication, but eventually every task will be done at least once). Commented Jun 13, 2021 at 10:08
  • 1
    Ha, the 'extensible lease' is probable the term that I was missing. I'll read on that, thanks :) In our case, the duplication is not a problem, as the logic is idempotent. We just need to have a way to prevent it from being executed concurrently by multiple instances, as this might lead to unexpected results in some edge cases. We'll also review if there is any way to either rewrite the logic to prevent that or to use some cron / queue instead of executing it directly, but the extensible locking is the easiest what we can do now and that should also work. Thanks! Commented Jun 13, 2021 at 13:56

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.