Designing a Scalable Caching Layer for User and Tenant Metadata in a Messaging System

Question 1

I'm developing a microservice-based application that processes a high volume of messages. Each message must be handled according to the user’s personal settings and some tenant-specific (customer) properties — for example, formatting rules or footer content.

In our B2B environment, every user belongs to a tenant, which we refer to as a "customer" in the application.

Current Setup

Each microservice handles messages independently and does the following:

Lazily fetches user and customer objects the first time they're needed.
Caches them in memory (using a HashMap) for future use.
When a user or customer updates their data, the main API sends update notifications to the microservices, which then update their in-memory cache accordingly.

Why Not Just Call the API Every Time?

Messages arrive very frequently, and fetching user/customer info from the API on every message would:

Significantly increase network traffic
Introduce latency for each message due to the API call
Potentially overload the central user/customer API service

That’s why I opted for this local caching approach.

Concerns

While this strategy has improved performance and reduced load on the central API, I'm beginning to worry about scalability:

The in-memory HashMap is growing large as we support more customers and users. This results in higher RAM usage in each service. I'm also questioning whether the notification/invalidation mechanism adds unnecessary complexity.

What I’m Looking For

I’m trying to find a better balance between:

Low latency: Message handling should remain fast.
Low memory usage: I want to avoid unbounded growth in memory consumption. Simplicity and reliability: The solution shouldn’t introduce unnecessary operational complexity.

Questions

Is this cache + notification approach reasonable for a high-throughput environment?
Are there standard patterns or best practices for managing this type of per-user and per-tenant data in microservice architectures?

Edit

Customer objects about 100 - 200
Users per customer 10 - 30
Size of customer object: ~ 2 KB
Size of user object: ~ 1.5 KB

Question 2

Customer objects about 100 - 200; Users per customer 10 - 30 => min 1000; Size of customer object: ~2 KB; Size of user object: ~ 1.5 KB; All services only us a fraction of the data available in a customer and user object What do you mean exactly with "expected bounds" the available RAM?

Question 3

To start with you could ensure each service only caches the data it actually needs. If you have say 100b per user/service, 1M users in total, and 100 services, that is 10Gb memory needed, or roughly 100$ in cost. If you increase the number of users you will eventually need more computers, each handling a separate sets of users. So is it really a problem? How much would it cost to rewrite?

Question 4

It's not clear to me what "Users per customer 10 - 30 => min 1000" means, exactly, can you clarify? What are your data freshness requirements? That is, how long after an update do you expect all clients to see the change? Also, what's the latency on the API calls.

Question 5

A customer is a tenant. Each tenant has about 10 - 30 users. The updates need to be reflected within a minute. I currently don't have any metrics on the latency of the API calls sorry.

Question 6

@GeekChap That part I get. What does "=> min 1000" relate to that? Is that the total clients across all customers?

Question 7

I don't think there is an easy "industry standard" solution to this problem. It comes down to making tradeoffs between RAM usage and system complexity.

First, you can always throw more hardware at the problem. Add RAM to services as needed. This could eventually become untenable as the increase in memory requirements becomes linear with the number of micro services, however this could suffice for quite a while. Monitor your services to spot problems before they arise. You might find that increasing RAM is fine for some services, but doesn't work well for others.

However:

Each microservice ... Lazily fetches user and customer objects the first time they're needed.

This ends up being your bottleneck. The trouble with the current approach is the solution to the performance problem is scattered all over your ecosystem. This is the part that doesn't scale well. Instead, push the caching into the micro services that are busiest. This plays into the "independently scalable" principle for micro services. If fetching user and customer data happens frequently, having those services cache their own data can speed up requests.

Still, one thing nags at me. Micro services should be independently scalable and deployable. The fact these other services must get user and customer info repeatedly hints at some design issues with the micro service ecosystem itself. You might have defined services that are too small. Consolidating services might be a better solution overall, even if you make each service bigger.

Aside: A common misconception with micro services is that they should be micro; small. They need to be independently deployable and scalable, otherwise you end up in precisely the spot you're in right now. Aim for "smaller than a monolith" while retaining independence.

Network usage can still be a concern if you cache user and customer information in the services that own that data. Again, don't optimize this until you observe a performance problem on the network — a problem, not just a difference in performance. Ownership of data is a major concern in micro services. Who is the single source of truth for what constitutes a user or customer? This doesn't mean user and customer data can only ever come from those dedicated services.

Other micro services which need user or customer information might only need a small subset. You could consider copying some user or customer info to each service instead of the full entity; just the bits relevant to a particular service. This adds complexity, though. Every update to a user or customer entity needs to be broadcast to the ecosystem so relevant changes can be propagated downstream. The tradeoff is independence. Each micro service can operate when the others are down, running slow, or unavailable.

There is no silver bullet here. Your existing solution is not a bad place to start, but I think it indicates some design problems in general. Services are not fully independent. Many micro services are calling out to get user and customer information, which could be causing the performance problem to begin with. Solutions exist, but each has its benefits and drawbacks:

Cache users and customers in the services that own that data. The ecosystem speeds up at the cost of increased network traffic.
Propagate a relevant subset of user and customer info to all services that need it. You gain independence and reduced network traffic at the expense of added complexity to sync user and customer info across the ecosystem.

I cannot tell you which option would work best. You'll need to assess these for yourself given the architecture of your ecosystem, the policies and procedures your organization has established, and the people doing the work. When in doubt: communicate. Engage the other teams to work on a solution.

score 1 · Accepted Answer · 2025-05-29 13:54:43Z

I don't think there is an easy "industry standard" solution to this problem. It comes down to making tradeoffs between RAM usage and system complexity.

First, you can always throw more hardware at the problem. Add RAM to services as needed. This could eventually become untenable as the increase in memory requirements becomes linear with the number of micro services, however this could suffice for quite a while. Monitor your services to spot problems before they arise. You might find that increasing RAM is fine for some services, but doesn't work well for others.

However:

Each microservice ... Lazily fetches user and customer objects the first time they're needed.

This ends up being your bottleneck. The trouble with the current approach is the solution to the performance problem is scattered all over your ecosystem. This is the part that doesn't scale well. Instead, push the caching into the micro services that are busiest. This plays into the "independently scalable" principle for micro services. If fetching user and customer data happens frequently, having those services cache their own data can speed up requests.

Still, one thing nags at me. Micro services should be independently scalable and deployable. The fact these other services must get user and customer info repeatedly hints at some design issues with the micro service ecosystem itself. You might have defined services that are too small. Consolidating services might be a better solution overall, even if you make each service bigger.

Aside: A common misconception with micro services is that they should be micro; small. They need to be independently deployable and scalable, otherwise you end up in precisely the spot you're in right now. Aim for "smaller than a monolith" while retaining independence.

Network usage can still be a concern if you cache user and customer information in the services that own that data. Again, don't optimize this until you observe a performance problem on the network — a problem, not just a difference in performance. Ownership of data is a major concern in micro services. Who is the single source of truth for what constitutes a user or customer? This doesn't mean user and customer data can only ever come from those dedicated services.

Other micro services which need user or customer information might only need a small subset. You could consider copying some user or customer info to each service instead of the full entity; just the bits relevant to a particular service. This adds complexity, though. Every update to a user or customer entity needs to be broadcast to the ecosystem so relevant changes can be propagated downstream. The tradeoff is independence. Each micro service can operate when the others are down, running slow, or unavailable.

There is no silver bullet here. Your existing solution is not a bad place to start, but I think it indicates some design problems in general. Services are not fully independent. Many micro services are calling out to get user and customer information, which could be causing the performance problem to begin with. Solutions exist, but each has its benefits and drawbacks:

Cache users and customers in the services that own that data. The ecosystem speeds up at the cost of increased network traffic.
Propagate a relevant subset of user and customer info to all services that need it. You gain independence and reduced network traffic at the expense of added complexity to sync user and customer info across the ecosystem.

I cannot tell you which option would work best. You'll need to assess these for yourself given the architecture of your ecosystem, the policies and procedures your organization has established, and the people doing the work. When in doubt: communicate. Engage the other teams to work on a solution.

Stack Exchange Network

Designing a Scalable Caching Layer for User and Tenant Metadata in a Messaging System

Current Setup

Why Not Just Call the API Every Time?

Concerns

What I’m Looking For

Questions

Edit

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Designing a Scalable Caching Layer for User and Tenant Metadata in a Messaging System

Current Setup

Why Not Just Call the API Every Time?

Concerns

What I’m Looking For

Questions

Edit

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions