-
-
Notifications
You must be signed in to change notification settings - Fork 415
Best configuration to tackle failover #2019
-
As the title says, anyone here have any idea to configure options to tackle failover?
Beta Was this translation helpful? Give feedback.
All reactions
Replies: 1 comment 5 replies
-
The way I handle this in a production environment + go-pg is to use something like pgbouncer infront of the actual database instances. If failover is fast enough this can be sufficient for relatively invisible failovers to end users.
However other ways are to use two instances of go-pg, one with a connection to a read only pool of PostgreSQL instances with pgbouncer in front of them. As well as an additional go-pg instance that connects to the primary instance (also with pgbouncer). This way in the event of a failover, writes may experience a brief hiccup while pgbouncer transitions to the new primary; but read only workloads should be relatively uninterrupted.
Beta Was this translation helpful? Give feedback.
All reactions
-
👀 1
-
Ok so coming from MYSQL world, it's very overwhelming to understand postgres. Spend sometime to understand the architecture. PostgreSQL is process based architecture, now I understand the essence of pgbouncer. Now I checked the code, it uses 5432 port in URL, so I'd assume that it is not using pgbouncer, just vanila go-pg for pooling and managing connections. Now without changing much, how to refresh the connections to tackle the failover. With RDS Aurora resolving the write endpoint with ttl 5s, the go-pg seems to ignore that. Do you think maxConnAge would work in this scenario, but the doc says useful with proxies like PgBouncer and HAProxy. Not sure how to take the word "useful" in the context, very vague explanation.
Beta Was this translation helpful? Give feedback.
All reactions
-
Just confirmed the URL, it uses the write RDS Aurora endpoint.
Beta Was this translation helpful? Give feedback.
All reactions
-
Hey @elliotcourant , apologies to ping you link this. This thing is stressing me out to understand.
Beta Was this translation helpful? Give feedback.
All reactions
-
So a few things here but I can try to help. I'm less familiar with the nuances introduced by managed PostgreSQL services such as Aurora, I'm more familiar with running PostgreSQL bare metal. It is possible that the "write" address that Aurora provides is already "high available" or will handle some form of failover if you have that configured in AWS. I know that at least in Google Cloud this was the case with their managed PostgreSQL, they would give you a static IP address for the primary, and whichever instance was the primary could always be connected to through that IP address (at least on VPC Peering, no idea how it works on the public internet addresses).
pgbouncer and haproxy act as a layer between you (the client) and the database (server). HAProxy does not understand the PostgreSQL wire protocol to my knowledge, so it will only offer some connection pooling benefits and maybe round-robin for read only workloads. But unless you really understand your stack end to end, including PostgreSQL; do not use HAProxy.
pgbouncer however does understand the wire protocol of PostgreSQL but does not automatically route to the primary instance of postgresql on its own. For example, if you are running 3 PostgreSQL instances with 1 primary and 2 read replicas, you can configure Pgbouncer to route queries to all 3 but it does not know enough to necessarily address specific queries to a specific instance based on workload.
The way you would solve this is by having a DNS address that points to the current primary instance and configure your application or pgbouncer to connect to that address. In pgbouncer at least, this address is resolved at connection time. So if the address changes in DNS, pgbouncer should connect to the most up to date address.
AWS should offer something similar to this in some way, either via the "writer connection url" or some other load balancer they provide.
Sorry if I have created confusion with how pgbouncer works.
For reference, I run the hosted version of monetr which uses go-pg. While I do not have have a separate connection for read only queries, the way I handle failover is via DNS to some degree.
I run 3x instances of PostgreSQL, 1x primary and 2x read replicas. I then run 3x pgbouncers in front of them. The primary PostgreSQL instance can be reached with a DNS address similar to postgres-rw.dns.internal
. Pgbouncer is configured to connect to this address. I then have my application running on the same server as pgbouncer so the connection from my application to my "connection pool" is not a big hop.
During a failover, the primary instance becomes unavailable and the connections from the individual pgbouncers are cut here: (client) <---> (bouncer) <-x-> (postgres)
. This leaves the connection pool that go-pg itself maintains basically intact.
Once the failover is complete, pgbouncer will connect to the new primary using the same DNS address as before; so new queries issued begin succeeding again.
This setup might be unique to my specific use case, and depending on what you are trying to do it might not scale well. Failovers are also not instant. If you do not care about churning connections in go-pg, you don't necessarily need pgbouncer. You can simply use the "Writer" DNS address for your setup.
Hopefully this is helpful/insightful
Beta Was this translation helpful? Give feedback.
All reactions
-
Thanks for the insight.
That's fine if I use pgbouncer. The Aurora endpoints for writer and reader are static as well. I am wondering if I don't use pgbouncer, then setting maxConnAge would work? After failover, the writer becomes reader and the reader becomes writer. Now with the existing pool, the old writer IPs are now pointing to reader and old reader IPs are pointing to writer after failover. Now I want the connection to be refreshed otherwise the writer connection won't work as it would be pointing to reader.
Beta Was this translation helpful? Give feedback.