Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Feature Request: Transparent Micro-Batching for Single INSERTs #1075

mirecl started this conversation in Ideas
Discussion options

Description:
Hi team! First of all, thanks for building PgDog — it's an amazing piece of software.
I'd like to discuss a potential feature for the L7 proxy core that could bring a massive 10x-50x write throughput improvement for specific workloads without changing the application code: Transparent Micro-Batching for INSERT queries.

1. The Problem
There are many modern applications, frameworks, and workflow engines that generate thousands of single-row, synchronous INSERT statements. Because they require immediate consistency/acknowledgment, they cannot be easily rewritten to use COPY or async queues.
This creates a massive transaction and TCP overhead on the PostgreSQL side (even with Transaction Pooling enabled), starving the database of IOPS and CPU. Modifying the source code of such applications is often impossible (e.g., third-party tools) or prohibitively expensive.

2. The Use Case (e.g., Temporal.io)
A great example is the Temporal Workflow Engine. It natively writes its history_node events row-by-row sequentially. When deploying Temporal at scale with Postgres, the database quickly becomes the bottleneck due to thousands of concurrent single INSERTs. We cannot easily batch them on the Temporal side, but they don't necessarily have strict cross-row transaction dependencies.

3. Proposed Solution
Since PgDog already parses the AST and controls the client-server TCP connections, it is perfectly positioned to do transparent micro-batching.

How it could work:

  • A client sends a single INSERT INTO table (a, b) VALUES (1, 2).
  • PgDog intercepts the query and puts the client's TCP socket into a suspended/waiting state.
  • PgDog buffers incoming INSERTs for this specific table from multiple clients.
  • When a threshold is met (e.g., batch_max_size = 100 or batch_timeout_ms = 10), PgDog rewrites the buffered queries into a single multi-tuple insert: INSERT INTO table (a, b) VALUES (1, 2), (3, 4), ...
  • PgDog sends this batched query to Postgres.
  • Upon receiving a success response, PgDog demultiplexes the CommandComplete (OK) response back to all waiting clients simultaneously.

Proposed Configuration syntax idea:

[[micro_batching]]
database = "prod"
tables = ["temporal.history_node", "logs.api_events"]
max_batch_size = 100
max_timeout_ms = 10
on_error = "fallback_to_single" # Strategy when batch fails

4. Known Challenges (Food for thought)
I realize this is a complex feature for a distributed proxy, especially regarding ACID guarantees:

  • Partial Failures: If 1 out of 100 rows violates a constraint (e.g., unique_violation), Postgres will abort the entire transaction. PgDog would need a strategy here (e.g., automatically retry them as single inserts to find the failing one, or return the error to all 100 clients).
  • RETURNING clause: If the INSERT has a RETURNING id clause, PgDog needs to correctly map the generated IDs back to the corresponding client connections.
  • Transactions: This should probably be disabled or strictly handled if the client executes the INSERT inside an explicit BEGIN ... COMMIT block.

Conclusion
Currently, the only way to scale this is to throw more hardware at Postgres or aggressively shard the tables. Having transparent L7 micro-batching inside PgDog would make it a killer tool for log-heavy or event-sourcing apps (like Temporal) on Postgres.
Would love to hear your thoughts on this! Is something similar already on the roadmap, or perhaps achievable via the new Rust Plugin system?

You must be logged in to vote

Replies: 1 comment

Comment options

Super cool feature, but hard to currently build with our architecture. Also, lots of edge cases, so the documentation for this would have quite a few warnings. The plugins also don't support async, so we can't suspend the request while we do something there gracefully.

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Ideas
Labels
None yet
2 participants

AltStyle によって変換されたページ (->オリジナル) /