Celery: how to avoid putting same task on the queu more than once?

Question 1

I wrote a very simple FastAPI/Celery/Redis/Flower program to start understanding their working. Repo is https://github.com/rjalexa/fastapi-redis if you care to see.

The FastAPI route will look for a string in the Redis cache and if it finds it returns a hash/dict and code=200.

If it does not it will pass that string to Celery that would trigger a long running computation (not in this dummy repo but in real life, it could take up to ten seconds) and return a code=202. When processing is done the results will be added to the Redis cache.

What I want to avoid is that when the string is queued to be processed, if I receive a request for the same string (the processing could take many seconds), I just want to return a code=202 without a new task being queued for the very same string.

Thanks for any clarification.

Question 2

Why don't you use a second table for signaling that you're preparing that data? Or already add the entry in the redis database with an empty value?

Question 3

Thank you @Isabi. The second idea would mean I would get a cache hit with no data and in that case I should still return a 202. Did I understand your suggestion? The first idea I need to understand better since I've never heard the "table" concept re Redis.

Question 4

I'm answering here instead of using the comments, since the answer is long.

Premise

I have never used redis, so take what I say with a grain of salt. I use the term table as I arrive from a SQL background, so it may be incorrect.

Idea 1

The underlying idea here is to divide the data in two: the cache and the preparation/pre-cache. The cache is simply the cache with the data, while the pre-cache is a sort of space where to park the data values that are being prepared. Once prepared, they are moved from the pre-cache to the actual cache. A missed hit on the cache will trigger a hit on the pre-cache. In this case you may return some info to the user that the data is not ready yet.

Idea 2

This idea combines the two tables of Idea 1 into one single table. You receive a request that has a valid entry in the cache with a value that is non-null, return the value. If there's a cache miss then, it means that the data has to be computed. Compute it but create first an entry in the cache. This way, if a second request arrives for the same data, the program will find an empty value, recognizing that the computation has started but is not finished yet. Your program will then behave as you decide for this edge case.

Question 5

Thanks a lot. That makes sense. My current python applicative solution seems to much simpler :) If I have a cache miss I place the key in a python set which acts as my "queue". The beauty is that inserting an existing key in the set will just be ignored. Simpler, but of course not as powerful .... Take care

Question 6

But upon further investigation maybe the right solution is the SADD (publish to a REDIS set) :)

Question 7

Didn't know it was called SADD but yes, that's more or less the idea that I had in mind

lsabi 4,6061 gold badge20 silver badges30 bronze badges · Accepted Answer · 2023-11-06 21:25:39Z

I'm answering here instead of using the comments, since the answer is long.

Premise

I have never used redis, so take what I say with a grain of salt. I use the term table as I arrive from a SQL background, so it may be incorrect.

Idea 1

The underlying idea here is to divide the data in two: the cache and the preparation/pre-cache. The cache is simply the cache with the data, while the pre-cache is a sort of space where to park the data values that are being prepared. Once prepared, they are moved from the pre-cache to the actual cache. A missed hit on the cache will trigger a hit on the pre-cache. In this case you may return some info to the user that the data is not ready yet.

Idea 2

This idea combines the two tables of Idea 1 into one single table. You receive a request that has a valid entry in the cache with a value that is non-null, return the value. If there's a cache miss then, it means that the data has to be computed. Compute it but create first an entry in the cache. This way, if a second request arrives for the same data, the program will find an empty value, recognizing that the computation has started but is not finished yet. Your program will then behave as you decide for this edge case.

Thanks a lot. That makes sense. My current python applicative solution seems to much simpler :) If I have a cache miss I place the key in a python set which acts as my "queue". The beauty is that inserting an existing key in the set will just be ignored. Simpler, but of course not as powerful .... Take care
But upon further investigation maybe the right solution is the SADD (publish to a REDIS set) :)
Didn't know it was called SADD but yes, that's more or less the idea that I had in mind

CollectivesTM on Stack Overflow

Celery: how to avoid putting same task on the queu more than once?

1 Answer 1

Premise

Idea 1

Idea 2

3 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

1 Answer 1

Premise

Idea 1

Idea 2

3 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related