I was asked to help determine if Drools is capable of high volume processing. So I in return asked what do we mean by high volume. They gave me the following:
20,000 files
5 different rulesets expected
80 rules per set (approximated)
500GB total mem in data
Drools will be running on an AWS t3.2xlarge Machine specs are: 32 GB Ram 8 CPUs 120 GB storage Because of the AWS aspect the host machine will be able to grow if required, not currently looking to use AWS Flink because of other considerations.
Basic rules are one of two types:
Does this data have a name tag with a value? Return T if true
Is the name NOT one of these values: [a, b, c, d]? Return T if true
The rules are mainly on the validation side of things but based upon failed validation routing of the data changes and reporting of the failure occurs.
With this information, can anyone suggest how to theoretically show Drools can or cannot possibly handle the workload?
Earlier engines using RETE seemed to wilt under large datasets but could not find performance related info on Drools 8.X.X. Full disclosure, I just started learning about Drools two days ago.
FWIIW, I did look over existing articles and didn't see any that offered a format or template to follow for evaluation.
-
This isn't answerable. It depends on how your rules are designed. About 8 years ago I was maintaining just under half a million rules with sub-second SLAs for thousands of requests per minute using roughly the same hardware as you're proposing. Another team was supporting around 10,000 rules with > 128Gb ram and 32 CPUs that was constantly deadlocking and falling over and sometimes took minutes to respond and could handle fewer than 100 tx per minute. It's all in how your rules are designed and how they interact between each other.Roddy of the Frozen Peas– Roddy of the Frozen Peas2025年06月09日 20:39:53 +00:00Commented Jun 9, 2025 at 20:39
-
thank you for your answer, our rules are going to be pretty straightforward in terms of required values, range values, enumerated values without a lot of interaction between those rules. I was thinking of loading each of the rulesets up to just measure how much mem the actual rules occupy, then from there match the expected flow rate seen in production and monitor performance. Is that in general a good approach given your experience?JavaJd– JavaJd2025年06月12日 22:20:45 +00:00Commented Jun 12, 2025 at 22:20
-
Refer to this older answer which was about performance: stackoverflow.com/questions/65621089/… . It's less about rules-in-memory footprint, but the footprint of your execution (inputs, any side effects, etc).Roddy of the Frozen Peas– Roddy of the Frozen Peas2025年06月12日 22:35:33 +00:00Commented Jun 12, 2025 at 22:35
-
Tyvm Roddy, will definitely give that a spin!!!!JavaJd– JavaJd2025年06月12日 22:43:24 +00:00Commented Jun 12, 2025 at 22:43