0

I was asked to help determine if Drools is capable of high volume processing. So I in return asked what do we mean by high volume. They gave me the following:


20,000 files

5 different rulesets expected

80 rules per set (approximated)

500GB total mem in data


Drools will be running on an AWS t3.2xlarge Machine specs are: 32 GB Ram 8 CPUs 120 GB storage Because of the AWS aspect the host machine will be able to grow if required, not currently looking to use AWS Flink because of other considerations.

Basic rules are one of two types:

Does this data have a name tag with a value? Return T if true

Is the name NOT one of these values: [a, b, c, d]? Return T if true

The rules are mainly on the validation side of things but based upon failed validation routing of the data changes and reporting of the failure occurs.

With this information, can anyone suggest how to theoretically show Drools can or cannot possibly handle the workload?

Earlier engines using RETE seemed to wilt under large datasets but could not find performance related info on Drools 8.X.X. Full disclosure, I just started learning about Drools two days ago.

FWIIW, I did look over existing articles and didn't see any that offered a format or template to follow for evaluation.

Mark Rotteveel
110k241 gold badges160 silver badges233 bronze badges
asked May 29, 2025 at 21:07
4
  • This isn't answerable. It depends on how your rules are designed. About 8 years ago I was maintaining just under half a million rules with sub-second SLAs for thousands of requests per minute using roughly the same hardware as you're proposing. Another team was supporting around 10,000 rules with > 128Gb ram and 32 CPUs that was constantly deadlocking and falling over and sometimes took minutes to respond and could handle fewer than 100 tx per minute. It's all in how your rules are designed and how they interact between each other. Commented Jun 9, 2025 at 20:39
  • thank you for your answer, our rules are going to be pretty straightforward in terms of required values, range values, enumerated values without a lot of interaction between those rules. I was thinking of loading each of the rulesets up to just measure how much mem the actual rules occupy, then from there match the expected flow rate seen in production and monitor performance. Is that in general a good approach given your experience? Commented Jun 12, 2025 at 22:20
  • Refer to this older answer which was about performance: stackoverflow.com/questions/65621089/… . It's less about rules-in-memory footprint, but the footprint of your execution (inputs, any side effects, etc). Commented Jun 12, 2025 at 22:35
  • Tyvm Roddy, will definitely give that a spin!!!! Commented Jun 12, 2025 at 22:43

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.