7

Hello people smarter than me! I've created a sort-of-a-queue table system, but it seems too simple to be safe from race conditions. Am I missing something or is the following race condition safe?

The Schema

I have a table, let's call it ProductQueue:

CREATE TABLE dbo.ProductQueue
(
 SerialId BIGINT PRIMARY KEY,
 QueuedDateTime DATETIME NOT NULL -- Only using this for reference, no functionality is tied to it
);

I have procedure for adding to the queue called AddToProductQueue:

CREATE PROCEDURE dbo.AddToProductQueue (@SerialId BIGINT)
AS
BEGIN
 INSERT INTO dbo.ProductQueue (SerialId, QueuedDateTime)
 OUTPUT Inserted.SerialId
 SELECT @SerialId, GETDATE();
END

I also have a procedure for removing from the queue called RemoveFromProductQueue:

CREATE PROCEDURE dbo.RemoveFromProductQueue (@SerialId BIGINT)
AS
BEGIN
 DELETE FROM dbo.ProductQueue
 OUTPUT Deleted.SerialId
 WHERE SerialId = @SerialId;
END

Note, SerialId is globally unique for a Product in the source database / system. I.e. no two instances of a Product can ever have the same SerialId. That's the extent of it on the database side.

The Workflow

  • I have an application process that runs hourly.
  • That process gets a variable list of SerialIds from the source system.
  • It iteratively calls the AddToProductQueue procedure on each SerialId in its list.
  • If the procedure tries to insert a SerialId that exists in the ProductQueue table already, it throws a primary key violation error, and the application process catches that error and skips that SerialId.
  • Otherwise, the procedure successfully adds that SerialId to the ProductQueue table and returns it back to the application process.
  • The application process then adds that successfully queued SerialId to a separate list.
  • After the application process finishes iterating its list of all candidate SerialIds to enqueue, it then iterates its new list of successfully queued SerialIds and does external work on them, in a separate thread per SerialId. (This work is all unrelated to the database.)
  • Finally, as each thread finishes its external work, the last step in that asynchronous thread is to remove that SerialId from the ProductQueue table by calling the RemoveFromProductQueue procedure. (Note that a new database context object is instantiated and a new connection is created for each asynchronous call to this procedure, so that it is thread-safe on the application side.)

Additional Information

  • There aren't any indexes on the ProductQueue table, and it'll never have more than 1,000 rows in it at one time. (Actually, most times it'll literally only have a couple of rows.)
  • The same SerialId can become a candidate again to be re-added to the queue table on a future execution of the application process.
  • There are no safe guards from preventing a second instance of the application process from concurrently running, either by accident or if the first instance took more than 1 hour to run, etc. (This is the concurrent part I'm most concerned about.)
  • The transaction isolation level of the database (and connection being made) where the queue table and procedures live is the default isolation level of Read Committed.

Potential Problems

  • The running instance of the application process crashes in an unhandled way, leaving SerialIds stuck in the queue table. This is acceptable for the business needs, and we plan to have exception reports to help us manually remediate this case.
  • The application process gets executed multiple times concurrently and grabs some of the same SerialIds between instances in their initial source lists. I can't think of any negative ramifications of this case yet, since the enqueuing procedure is atomic, and the actual list of SerialIds that the application process will work on should be self-contained due to that atomic enqueuing procedure. We don't care which instance of the application process actually processes each SerialId as long as the same SerialId isn't processed concurrently by both process instances.
asked Mar 13, 2024 at 21:46
0

4 Answers 4

9

The only thing you're asking the database engine to do in this scenario is to enforce the PRIMARY KEY. It will do that under all conditions and isolation levels, of course.

The potential race conditions are all external to the database, considerations that aren't really on-topic here.

That said, the only way I can think of the database being involved in a race condition is for the process of adding candidate serial IDs to be wrapped in a database transaction, but you didn't mention anything about that.

Perhaps it's possible for the processes to use a database transaction in an unintended way, for example if you're using an ORM that does helpful things by magic without being explicitly asked. Or perhaps you are using implicit transactions.

In that scenario, app instance A would start adding its (long list of) serial IDs all within a single database transaction. Meanwhile, app instance B (with an equally long list, including at least one present in A's list) would become blocked on the insert of an overlapping serial ID (because instance A hasn't committed its transaction yet).

In an unfortunate sequence of events, instance A would finish asynchronous processing a duplicate serial ID (toward the start of its list) before blocked instance B can perform its insert (towards the end of its list).

In that case, both A and B would successfully process the same serial ID.

This seems unlikely, given the process outline you have described but not impossible.

answered Mar 14, 2024 at 9:09
2

I'm not sure why the name of your table has the word "queue" in it, but it looks to me that what you really want is a locking mechanism.

To answer your question directly: if you don't care about stale locks, and as long as the INSERT transaction, the processing, and the DELETE transaction are in a causal relationship (i.e. happen in this order after the previous step reports success), I don't see any potential for double-processing.


That said, one way to address both issues that you list under "potential problems" in an automated way would be implementing the single-instance variant of the locking algorithm Redlock.

  1. Each instance has its own unique id (say, a guid or something that you generate every time you run your processing application)

  2. Whenever the instance grabs a SerialId, it tries to obtain a lock on it:

    CREATE TABLE ItemLock (serialId BIGINT NOT NULL PRIMARY KEY, instance UNIQUEIDENTIFIER NOT NULL, expires DATETIME NOT NULL);
    
    WITH source (serialId, instance, expires) AS
     (
     SELECT @serialId, @instance, @expires
     )
    MERGE
    INTO ItemLock target
    USING source
    ON target.serialId = source.serialId
    WHEN NOT MATCHED BY TARGET THEN
    INSERT (serialId, instance, expires)
    VALUES (serialId, instance, expires)
    WHEN MATCHED AND (target.instance = source.instance OR target.expires <= @now) THEN
    UPDATE
    SET instance = source.instance,
     expires = source.expires
    OUTPUT INSERTED.serialId
    

    The output of the query, if any, will be the serial id you have a lock on. If there is no output, you don't have the lock.

    Pass the current timestamp in the variable @now and the timestamp five minutes in the future in the variable @expires

  3. Run a keep-alive thread that submits this query every minute or however often you need it, extending the value in the variable @expires, say, five minutes past now every time. You can amend the query to extend locks that you haven't processed so far in a single batch. Make sure to commit right away.

  4. When you're done processing, delete the lock only if it's yours:

    DELETE
    FROM ItemLock
    WHERE serialId = @serialId
     AND instance = @instance
    

This way:

  1. If your application dies, the lock will die by itself in 5 minutes.
  2. No concurrent processing will happen, as long as you successfully obtain the lock and keep it alive.

Of course, you can just install a real Redis instance and use a ready-made implementation of Redlock in your application, of which there are plenty.

answered Mar 14, 2024 at 18:41
2
  • Hey, thanks for your answer! Couple things: "not sure why the name of your table has the word "queue" in it, but it looks to me that what you really want is a locking mechanism." - Bingo! That's why my question introduces it as "sort-of-a-queue". But yes, it's really a self isolating locking system. Secondly, I appreciate the effort of an atomic upsert, but I rather avoid the risk of bugs with the MERGE statement and just hold the proper locks against the table within a transaction so I can run multiple statements as needed. Though I don't think I'd ever want to do an UPDATE with... Commented Mar 14, 2024 at 21:12
  • ...my current use cases. The goal is to ensure once a specific SerialId is locked by an instance of the application process, only that same process can work on that SerialId so long as that instance of the SerialId row exists in the table. Finally, "If your application dies, the lock will die by itself in 5 minutes." - It seems like the business rules would prefer to let the lock persist in such a scenario so that manual intervention can be had. Ultimately this application process is syncing data (in a non-transactional way) to a 3rd party system with limited controls / access. Commented Mar 14, 2024 at 21:13
-1

I found the Service Broker to be completely unsatisfactory, imposing a lot of overhead. The conversations require cleanup. You are better off with your table.

I would also recommend allowing row locks on the index behind your primary key.

answered Mar 15, 2024 at 21:55
1
-1

Nothing wrong with what you are doing but SQL Server already has a built-in service, Service Broker, that handles and manages data queues. I would humbly suggest you invest your time in that.

mustaccio
28.7k24 gold badges60 silver badges77 bronze badges
answered Mar 15, 2024 at 9:33
1
  • Hey, I didn't downvote you, but I'd be happy to upvote you if you wanted to link to what you're referring to and expand on your answer a little bit by describing it further (which you can probably pull from the docs). 🙂 Commented Mar 15, 2024 at 12:53

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.