I have the request to integrate my Python server application (Flask/waitress) with Kafka.
That means that it should frequently poll a certain Kafka topic (on the order of minutes) and process all new events. As the application is deployed in the cloud, I am looking for a scalable solution so that different events could in theory be processed by different pods.
My question is: Are there known patterns for this situation? Which architecture would you recommend?
Current ideas:
- Use a cronjob to trigger a REST endpoint which will then trigger the Kafka consumer logic
- Pro: Reusing the Flask-server capabilities to have multiple requests
- Contra: Extra tests needed to verify that cronjob is still running, local development is different from deployed server
- Use a scheduler inside of Python
- Pro: Easy to understand conceptually, local development is close to deployed server
- Contra: Threading has to be implemented if several events are to be processed in parallel
1 Answer 1
I've done this in the past using Celery. I also used Celery Flower for this app as it's great for monitoring the Celery tasks.
You basically create a scheduled Celery task to consume your desired topic.
I think I used this Kafka client as well.
https://docs.confluent.io/kafka-clients/python/current/overview.html
I used this amazing blog post my Miguel Grinberg: https://blog.miguelgrinberg.com/post/using-celery-with-flask
Explore related questions
See similar questions with these tags.