Scaling PostgreSQL TRIGGER(s)

Question 1

How Postgres triggers mechanism scales ?

We have a large PostgreSQL installation and we are trying to implement an event based system using log tables and TRIGGER(s).

Basically we would like to create a TRIGGER for each table we want to be notified for an UPDATE/INSERT/DELETE operation. Once this trigger fires it will execute a function that will simply append a new row (encoding the event) to a log table that we will then poll from an external service.

Before going all in with Postgres TRIGGER(s) we would like to know how they scale: how many triggers can we create on a single Postgres installation? Does they affect query performance? Did anyone before tried this ?

Question 2

You may find checking PgQ useful, it uses C triggers for registering the data modification events.

Question 3

Have a look at listen/notify you might not need triggers at all: postgresql.org/docs/current/static/sql-listen.html

Question 4

Basically we would like to create a TRIGGER for each table we want to be notified for an UPDATE/INSERT/DELETE operation. Once this trigger fires it will execute a function that will simply append a new row (encoding the event) to a log table that we will then poll from an external service.

That's a pretty standard use for a trigger.

Before going all in with Postgres TRIGGER(s) we would like to know how they scale: how many triggers can we create on a single Postgres installation?

If you keep creating them, eventually you'll run out of disk space.

There's no specific limit for triggers.

PostgreSQL limits are documented on the about page.

Does they affect query performance?

It depends on the trigger type, trigger language, and what the trigger does.

A simple PL/PgSQL BEFORE ... FOR EACH STATEMENT trigger that doesn't do anything has near-zero overhead.

FOR EACH ROW triggers have higher overhead than FOR EACH STATEMENT triggers. Scaling, obviously, with the affected row counts.

AFTER triggers are more expensive than BEFORE triggers because they must be queued up until the statement finishes doing its work, then executed. They aren't spilled to disk if the queue gets big (at least in 9.4 and below, may change in future) so huge AFTER trigger queues can cause available memory to overrun, resulting in the statement aborting.

A trigger that modifies the NEW row before insert/update is cheaper than a trigger that does DML.

The specific use case you want would perform better with an in-progress enhancement that might make it into PostgreSQL 9.5 (if we're lucky), where FOR EACH STATEMENT triggers can see virtual OLD and NEW tables. This isn't possible in current PostgreSQL versions, so you must use FOR EACH ROW triggers instead.

Did anyone before tried this ?

Of course. It's a pretty standard use for triggers, along with auditing, sanity checking, etc.

You'll want to look into LISTEN and NOTIFY for a good way to wake up your worker when changes to the task table happen.

You're already doing the most important thing by avoiding talking to external systems directly from triggers. That tends to be problematic for performance and reliability. People often try to do things like send mail directly from a trigger, and that's bad news.

Question 5

It's a slightly belated answer, but it might be useful for future readers

Now days (in 10,11,12 versions) we don't need to store same data twice (in WAL by PG and manually). We can use Postgre Logical Decoding mechanics (same as logical replication) to track all or some changes to our data (or send those events to some queue like kafka to analyze later)

Craig Ringer Craig Ringer 58k6 gold badges162 silver badges194 bronze badges · Answer 1 · 2015-01-08 02:54:18Z

Basically we would like to create a TRIGGER for each table we want to be notified for an UPDATE/INSERT/DELETE operation. Once this trigger fires it will execute a function that will simply append a new row (encoding the event) to a log table that we will then poll from an external service.

That's a pretty standard use for a trigger.

Before going all in with Postgres TRIGGER(s) we would like to know how they scale: how many triggers can we create on a single Postgres installation?

If you keep creating them, eventually you'll run out of disk space.

There's no specific limit for triggers.

PostgreSQL limits are documented on the about page.

Does they affect query performance?

It depends on the trigger type, trigger language, and what the trigger does.

A simple PL/PgSQL BEFORE ... FOR EACH STATEMENT trigger that doesn't do anything has near-zero overhead.

FOR EACH ROW triggers have higher overhead than FOR EACH STATEMENT triggers. Scaling, obviously, with the affected row counts.

AFTER triggers are more expensive than BEFORE triggers because they must be queued up until the statement finishes doing its work, then executed. They aren't spilled to disk if the queue gets big (at least in 9.4 and below, may change in future) so huge AFTER trigger queues can cause available memory to overrun, resulting in the statement aborting.

A trigger that modifies the NEW row before insert/update is cheaper than a trigger that does DML.

The specific use case you want would perform better with an in-progress enhancement that might make it into PostgreSQL 9.5 (if we're lucky), where FOR EACH STATEMENT triggers can see virtual OLD and NEW tables. This isn't possible in current PostgreSQL versions, so you must use FOR EACH ROW triggers instead.

Did anyone before tried this ?

Of course. It's a pretty standard use for triggers, along with auditing, sanity checking, etc.

You'll want to look into LISTEN and NOTIFY for a good way to wake up your worker when changes to the task table happen.

You're already doing the most important thing by avoiding talking to external systems directly from triggers. That tends to be problematic for performance and reliability. People often try to do things like send mail directly from a trigger, and that's bad news.

Alexandr Latushkin Alexandr Latushkin 1211 bronze badge · Answer 2 · 2019-08-23 09:01:01Z

It's a slightly belated answer, but it might be useful for future readers

Now days (in 10,11,12 versions) we don't need to store same data twice (in WAL by PG and manually). We can use Postgre Logical Decoding mechanics (same as logical replication) to track all or some changes to our data (or send those events to some queue like kafka to analyze later)

Stack Exchange Network

Scaling PostgreSQL TRIGGER(s)

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Scaling PostgreSQL TRIGGER(s)

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions