I'm building a new, multitenant application using microservices and Postgres RDS in AWS. Each tenant will have their own database within the RDS instance (additional RDS instances/servers will be added when needed). There will be three microservices that will need access to each tenant database. One microservice will serve UI activity and the other two will handle background transactions, not bound to the UI.
I'm currently using C3p0 for connection pooling for each microservice attached to each tenant database (using a map to manage the tenant connection pools). When a call is made, the tenant id will be passed as part of the payload, determine which connection pool to use then invoke the sql statement. For reference, each tenant will have 20-2000gb of database data.
My question is around postgress max connections. Is the max connections for each database or the database server? I know Postgres can handle thousands of databases but how do you maintain connections to all of these databases?
Suppose I have 100 tenant databases on one database server. Each microservice has a connection pool set to have 5 connections. That would be 15 connections per tenant, or 1500 connections to the database instance/server. Now, I have to have redundancy on the microservices, so I'll have at least 2 of each microservice running, which would double the total database connections to 3000. Add more microservice instances for scalability and my connection count to the database server again increases. To keep it simple for this conversation, I'd scale all the microservices together as a group when there is additional load, so 3 sets of microservices requires 4500 connections and so on.
Will this even work? Am I thinking about connection pooling and Postgres connections correctly? The plan is to use separate databases per tenant for various reasons, unless that is, there is a better way to deploy that makes backup/restores simple, provides data isolation at the SQL level, etc.
I've read through numerous articles with performance issues with one database and a schema per tenant so I think that option is ruled out. The other alternative would be one database with shared tables for all tenants and using the tenant id as part of the primary key, but given the need to keep data isolated, i'm pulling that off the table for consideration.
-
Max connections is for the entire instance, not just one database. With that many connections you have to think about pooling. Either in your application (JavaEE application servers have a connection pool built in) or through a proxy like pgBouncer or pgPooluser1822– user18222018年03月05日 08:35:41 +00:00Commented Mar 5, 2018 at 8:35
-
Ok, thanks for clarification on max connections. So I am using connection pooling (Java C3P0) but connection pooling, even with pgBouncer is done per database. If I have 100 db's, and 5 connections per connection pool, doesn't that still give me 500 open connections to the entire instance?felix53– felix532018年03月05日 14:21:26 +00:00Commented Mar 5, 2018 at 14:21
1 Answer 1
max_connections
from postgresql.conf
is for the entire server, but CONNECTION LIMIT
from CREATE|ALTER DATABASE
command is for that specific database, so you have your choice.
You might barely get away with 4500 connections, but only if the vast majority of them don't do anything the vast majority of the time. In which, case, why have them? Why does each microservice need a connection pool of 5?
I've read through numerous articles with performance issues with one database and a schema per tenant so I think that option is ruled out.
There are lots of bad articles out there, or ones based on 15 year old experiences. A lot has changed in 15 years. Which articles are they? And did they actually say that a database per tenant was better than a schema per tenant, or just that a schema per tenant was bad with no comparison to the alternative?
-
1I checked the dates on the articles and yes, they were a few years old. I noticed that "Apartment" Rails gem now includes a blurb about Postgres and performance improvements when using separate schemas. Much of what I read was that as you increase the number of schemas in a single databases, performance decreases as compared to the same number of separate databases.felix53– felix532018年03月05日 21:13:53 +00:00Commented Mar 5, 2018 at 21:13
-
As far as a connection pool of 5 for each ms, it's an arbitrary value I picked just to make the math easy. Tenant users will range from 10 to 1000+; some tenants will have little activity while others will have heavy requests throughout their operating hours. No matter the value of the connection pool, I need to be able to scale up more application servers when the load increases, which would amplify the number of connection pools / db connections. Going to load test with single database, separate schemas and a shared connection pool across all tenants per microservice.felix53– felix532018年03月05日 21:20:19 +00:00Commented Mar 5, 2018 at 21:20
-
@jjanes How do I know what connection limit is set for a existent database?Luciano Andress Martini– Luciano Andress Martini2023年01月13日 13:46:17 +00:00Commented Jan 13, 2023 at 13:46
-
@LucianoAndressMartini you can find it in the datconnlimit column of pg_database.jjanes– jjanes2023年01月13日 17:22:43 +00:00Commented Jan 13, 2023 at 17:22
-
I can query the current value of
max_connections
withSHOW
but how to know what is the currentCONNECTION LIMIT
for a certain db?Rafs– Rafs2024年10月01日 11:42:12 +00:00Commented Oct 1, 2024 at 11:42