I have an application made of several microservices with independent databases
So let's consider 2 microservices
Users and Documents
The users microservice manages the users and handles everything related to them and the documents microservice contains the document info, properties, file ....
Now each document belongs to a user and in the document database we have the user_id
I know usually you should make a call to users to get the users data but imagine this scenario :
I have an api call that gets all the documents that are shared for a user, each document belongs to a user so if i fetch a thousand documents i have to make a thousand additional call to get the user data.
I thought of several options :
- have "common" tables like users replicated to all databases through a queue system (kafka for example) -> this will need to be maintained but gives the flexibility of having joins and foreign keys which makes it more performant
- Make an api route to take an array of ids and send back the data and handle this each time in the documents api to build back the object -> this will take additional work each time and less performance but it will be the "clean" way to work with microservices
- link each microservice to a "common" database in addition to its own database for easy access -> this is a shortcut to replicate tables but won't solve the issue and we can't join on it using ORMs
I imagine it's a common case, but I can't see anywhere what's the best practice for this use case
-
7This sounds like a pretty good illustration of why microservices are an anti-pattern unless you really need them. How many thousands of requests per second are you looking to serve? If the answer is less than one, just put everything back on the one database and do a "normal" application.pjc50– pjc5009/07/2021 09:59:04Commented Sep 7, 2021 at 9:59
-
Yes that was my idea, but it's too late for that now, the application is already deployed this way and don't want to change the whole architectureDany Y– Dany Y09/07/2021 12:23:15Commented Sep 7, 2021 at 12:23
-
Why do you need to hit the users service a thousand times if you fetch the documents? What is the reason for this? What is the use case?Andy– Andy09/07/2021 14:06:48Commented Sep 7, 2021 at 14:06
-
1Well let's say each document belongs to a user, the front will need to show the document and some details of the user(his full name, his profile pic....)Dany Y– Dany Y09/07/2021 15:50:36Commented Sep 7, 2021 at 15:50
3 Answers 3
You need to analyse this from the point of view of use cases, not just "API calls".
I have an api call that gets all the documents that are shared for a user, each document belongs to a user so if i fetch a thousand documents i have to make a thousand additional call to get the user data.
Ok, but what is the use case where this API call is required?
Is this part of a batch process which runs at night and does something with those documents? So the performance requirements are not that strict, and extra calls are not an issue? Maybe that batch process does not even need more than the user IDs?
Is it an interactive feature for a user to check which of their documents are in use by others? Is "a thousand documents" a realistic assumption in this kind of scenario? A user who issues an API call which returns them "a thousand documents" may not have the expectation of the system returning the result immediately, and such many documents are not really easy to manage for a human in a user interface, so how many documents does a user really look at at once?
Which user information is really required for this use case, and why/when? Maybe not not the full "user profile" information stored by the "Users" microservice, maybe only the names of those users, and maybe only when one request the details? Maybe the information can be retrieved in a background thread whilst the documents are already shown.
Of course, when all this user information is really required quickly, an array-based approach is probably the way to go, and your idea of a local user cache can be also viable. However, to make such a decision, one needs to look at the real world scenario and the real numbers, which gives the missing context to make a meaningful decision.
-
To explain briefly the scenario (because this is just an example and i have several cases) : Performance is important, because it's when using the web application, more importantly what my concern is it's the simplicity to the development, I don't want to add too much overhead on what should be a given that when you're getting a document it should have info about the owner of this doc for exampleDany Y– Dany Y09/07/2021 13:45:14Commented Sep 7, 2021 at 13:45
-
1@DanyY: microservices make certain things simpler in development, and others not, this is a trade-off. Finding the right balance requires usually an in-depth analysis of your specific use cases, don't expect an easy answer for this.Doc Brown– Doc Brown09/07/2021 14:53:34Commented Sep 7, 2021 at 14:53
Microservices aim to be independently deployable. In this regard, it all depends how interleaved the documents and users are.
The user example is moreover a delicate example, since users may also be related to authentication and authorizations and might therefore be used in your scenario for different purposes:
From a very general point of view, the user id is sufficient to identify the user and the ownership. The document microservice does not need to know more on the user and this is a great design fro the GDPR point of view (data minimization).
Your document service might need to control if user id is valid (e.g if it is not obtained via a trusted JWT or SAML token). Moreover it might need to manage access rights or quotas that are specific to the document service. In this case, you may want to replicate the list of user-id from the user service and manage minimalized and specific user-id related data locally in the document service.
In some cases, your document service might have to know a lot more on a user, because of the requirements. It may then for the sake of independence have a local understanding of what a user is to fulfill its own requirements and feed its data from the user events, with the cost of redundancy. But be careful: it’s not just a replication; if the user management service evolves, the document managment service may still work with its own view of what a user is. That’s the principle of independently deployable services.
I thought your simple description of this very common microservice data architecture problem is very well illustrated. It truly illustrates the issue using a very simple sample.
Essentially, If you want to display a simple web page grid of documents now and that grid will needs to display some amount of user data (which could be customizable by the end user even) how can you do that without making 1000 calls to user microservice right? This makes total sense.
Anyone who tries to design microservices faces this same data architecture challenge.
You have identified valid solves above and if they seem ugly they are ugly. This is the life of microservices and what it really takes to architect them.
First, The users data would be stored in a database accessible only to the user microservice. Then you have several design options.
You can write all your microservice data to a datalake including users and documents. The datalake will be in charge of relating the data for reporting capabilities and downstream systems. You will then render your grid from the related datalake data and not from your microservices who are in charge of only application functionality but not reporting.
You can take the common user data and on every single change replicate it down to every single microservice that uses the data using something like sql transactional replication. Every microservice will then have its own copy of the data. You could use foreign keys in this manner. This is likely very high maintenance and fully reliant on replication never failing which I wouldnt personally trust myself for mission critical services.
You can write every single insert/update made to the users data to a queue and then consume that queued data and sync it down to every single microservice that needs that data. So if the documents microservice needs user data then the queue would sync the user data down to the documents microservice ensuring that every microservice got its own copy of the user data and this would essentially provide all user data to all microservices that need it. Most likely you would not use something like sql server here and would likely use something like Mongo or Cosmos db and this is most likely the architecture you would go with that would likely work out the best.
As you can see you need a lot of infrastructure to support microservice architectures.
Pretty difficult for a small software company to build this out without some decent cloud infrastructure and db infrastructure.
Hope that helps.