TobiasNickel
Feb 8, 2022

hi, yesterday I read this article about concurrent transactions (mysql) dev.to.

And I found in order to make the transaction concurrent, there was a very large connection pool (38). and on a single connection there is always only one transaction AND only one query at a time (that is how the mysql driver works).

This postgres module is able to run many queries concurrent on a single connection, by instantly sending them, without waiting for previous responses. Because postgresQL respond results in the same order the queries where send, the results or errors can be mapped back to the original query/promise.

What I liked in the article was the syntax, of fire of many queries, but not await them, and only await the final transaction to finish.

In a very limited test, I was able to show, that in postgres(with the DB server) we could even send many transactions concurrently on to the db on a single connection. If this module can support a syntax/pattern, it would be possible to gain huge performance boosts.

here is my test script:

first it create a postgres instance with max connection 1
do a single query to establish a connection
test 2 transactions unmanaged, but traditional
test 2 transaction unmanaged AND concurrently
print timing and verify that both results are the same.

const postgres = require('../lib/index');
const sql = postgres(databaseURL, {
 max: 1
});
main().catch(err => console.log(err)).then(() => process.exit());
async function main() {
 // await one query to establish a connection
 await sql `select 1 as b`
 const start = Date.now()
 const results = [];
 results.push(await sql `begin`);
 results.push(await sql `select 1 as a`);
 results.push(await sql `select 2 as a`);
 results.push(await sql `select 3 as a`);
 results.push(await sql `select 4 as a`);
 results.push(await sql `select 5 as a`);
 results.push(await sql `select 6 as a`);
 results.push(await sql `commit;`);
 results.push(await sql `begin;`);
 results.push(await sql `select 1 as b`);
 results.push(await sql `select 2 as b`);
 results.push(await sql `select 3 as b`);
 results.push(await sql `select 4 as b`);
 results.push(await sql `select 5 as b`);
 results.push(await sql `select 6 as b`);
 results.push(await sql `commit;`);
 console.log('sync done ', Date.now() - start);
 // concurrent transaction
 const startConcurrent = Date.now();
 const results2 = await Promise.all([
 sql `begin`,
 sql `select 1 as a`,
 sql `select 2 as a`,
 sql `select 3 as a`,
 sql `select 4 as a`,
 sql `select 5 as a`,
 sql `select 6 as a`,
 sql `commit;`,
 sql `begin;`,
 sql `select 1 as b`,
 sql `select 2 as b`,
 sql `select 3 as b`,
 sql `select 4 as b`,
 sql `select 5 as b`,
 sql `select 6 as b`,
 sql `commit;`
 ]);
 console.log('concurrent done ', Date.now() - startConcurrent);
 if (JSON.stringify(results) == JSON.stringify(results2)) {
 console.log('results are the same')
 } else {
 console.log('results are different')
 }
}

AND here is the result (with a remote slow database):
Screenshot 2022年02月08日 133242

I am thinking about a syntax like this:

await sql.begin(transaction =>{
 await transaction`select 1 as n`;
 transaction`update datatable values...`; 
 // While this update and the commit is running, the next task/request/api-call can 
 // send his query or start a new transaction.

This can work with postgres, because we can send 10 update queries as part of a transaction. when the third update query fail, postgres will give us a distinct error for every of the following other update queries AND turn the final commit command into a rollback.

I know in real code, the potential gain is unlikely 10x like in the little example, but 30-50% should can be realistic.

What do you think? Do you think it is possible to support "Concurrent transactions on single connection" within this project? any other thoughts?

Replies: 6 comments

porsager
Feb 8, 2022
Maintainer

Hi Tobias.

This is already implemented in Postgres.js and is called Pipelining, so yes - definitely possible 😉

Now when using Postgres.js you need to use sql.begin() (see the docs) for transactions, since if you're not doing that you don't have any assurance that your queries will run on the same connection, and consequently inside your transaction. The reason your code above works is only because you've set max: 1.

To achieve pipelining with transactions you simply return an array with the "unawaited" queries like this:

const result = await sql.begin(sql => [
 sql`select 1 as a`,
 sql`select 2 as a`,
 sql`select 3 as a`,
 sql`select 4 as a`,
 sql`select 5 as a`,
 sql`select 6 as a`
])

In v1 there is no limit to the number of pipelined queries per connection, but through testing I've found pipelining benefits max out around about 100 pipelined queries (of course different depending on the connection), but since a broken connection will throw all those queries I think it's a good tradeoff to set a boundary on the amount of queries pipelined, so in v3 - 100 is the default, but can be configured using max_pipeline.

0 replies

TobiasNickel
Feb 9, 2022
Author

hmm, the pipeling does for queries within this transaction what independent queries can do without pipeline. But there might be some more optimizations potential.

please give me some time and keep this issue open. I want to do some test and study the code.

0 replies

porsager
Feb 9, 2022
Maintainer

No problem :) A possible optimization in v3 is allowing multiple full transactions to be pipelined, but to be honest a lot of real world scenarios can do that be using CTE's instead which in any case I can think of is a better way of going about it anyway.

Also, I read your comment to the article linked, and I think it's telling of a footgun in Postgres.js that you tried to run transactions like you did. I would love your insight to improve either the documentation, or if there should be a mechanism built in to prevent users of the library to try and do transactions like that which could be very dangerous?

0 replies

TobiasNickel
Feb 11, 2022
Author

So,... this is the second time I start writing:

I often see the pattern of processes that can be triggered from API or message-system, that looks like this:
(Usually, this is decorated with more function calls and OOP patterns, but it boil down to this. )

async function(args){
 await sql.begin(t => {
 const items = await sql`select * from items where id in (${args})`;
 const relatedItems = await sql`select * from items where id in (${items.map(i=>i.relatedId)})`;
 // some processing, sorting, mapping filtering ....
 await sql`update ...`;
 await sql`update ...`;
 });
}

Here is a timeline for this process. We have only one connection and it is called twice.

b=begin
s=select
u=update
c=commit/rollback
p=business logic processing
[1-9]=waiting / network roundtrip
-> b 1 s 2 s 2 p u 3 u 4 c 5 b 1 s 2 s 2 p u 3 u 4 c

All we talk about is how to remove as many numbers as possible (remove as much waiting time as possible.)

waiting after begin. We have to await here. Because the db can return the error 0b000, invalid_transaction_initiation. If we already started pipelining (not await) the following statements could run without transaction. Something like a connection error would be ok because no other sql would be send and the transaction would rollback in db. But if the error is db internal like (I don't know) transaction quota or max memory.

Because of this, that we do not want to pipeline queries that are supposed to run in a transaction, to run without transaction, we have to await the begin statement. And that, makes the "concurrent transaction on a single connection" impossible.

If the postgres server allow us to send a flag with each query of (execute only in transaction), then it would be possible.

The number 2 waiting blocks are logically needed by the application. maybe the developer could try to use a join or something.
This waiting times with a 3, It is the developers decision. He can also wrap them into a Promise.all to pipelining the updates.
This library has to wait for number four,because after some updates, the app could continue with more logic within the transaction.
The break between the transactions. I think we can get rid of this. In v2 I saw how to do it by calling the next function, in v3 it is more difficult.

pipelining as of today

The pipelining with 5 queries causes the following timeline today:

-> b 1 uuuuu 4 c 5

We have to await the number 1, I think there is potential to avoid number 4 and 5 awaiting.
number 5 awaiting is as described above.

for number for awaiting, I made a change rewrite...TobiasNickel:rewrite

This change is immediately pipelining the commit command. This is ok, because the db will automatically turn this into a rollback if needed. The application is going to receive the first error from Promise.all.

conclusion

I have not been able to bring a revolution with single connection concurrent transaction.

But I hope this post is valuable for you.

0 replies

uasan
Feb 11, 2022

Postgres protocol, supports pipeline but does not support multiplexing

0 replies

marcbachmann
Feb 11, 2022

Interesting topic. I've also just recently suggested to implement a .pipeline function in node-postgres.

0 replies

Uh oh!

single connection concurrent transaction #282

Uh oh!

TobiasNickel Feb 8, 2022

Replies: 6 comments

Uh oh!

Uh oh!

porsager Feb 8, 2022 Maintainer

Uh oh!

TobiasNickel Feb 9, 2022 Author

Uh oh!

Uh oh!

porsager Feb 9, 2022 Maintainer

Uh oh!

TobiasNickel Feb 11, 2022 Author

pipelining as of today

conclusion

Uh oh!

uasan Feb 11, 2022

Uh oh!

marcbachmann Feb 11, 2022

TobiasNickel
Feb 8, 2022

porsager
Feb 8, 2022
Maintainer

TobiasNickel
Feb 9, 2022
Author

porsager
Feb 9, 2022
Maintainer

TobiasNickel
Feb 11, 2022
Author

uasan
Feb 11, 2022

marcbachmann
Feb 11, 2022