Prevent consecutive duplicate values without a trigger

Question 1

Within a group, I'd like to prevent INSERTs of consecutive duplicate values, where "consecutive" is defined by a simple ORDER BY clause.

Imagine a set of experiments which is regularly sampling values from a sensor. We only want to insert a value if it is new for that experiment.

Note that older values are allowed to be duplicates. So this is allowed:

id experiment value
 1 A 10
 2 A 20
 3 A 10

but this is not:

id experiment value
 1 A 10
 2 A 10

I know how to find the previous value per experiment:

 SELECT
 *,
 lag(sample_value) OVER experiment_and_id
 FROM new_samples
 WINDOW experiment_and_id AS (
 PARTITION BY experiment
 ORDER BY id
 );

From the docs I know that CHECK constraints are not allowed to use other rows in their checking:

PostgreSQL does not support CHECK constraints that reference table data other than the new or updated row being checked. While a CHECK constraint that violates this rule may appear to work in simple tests, it cannot guarantee that the database will not reach a state in which the constraint condition is false (due to subsequent changes of the other row(s) involved). This would cause a database dump and reload to fail. The reload could fail even when the complete database state is consistent with the constraint, due to rows not being loaded in an order that will satisfy the constraint. If possible, use UNIQUE, EXCLUDE, or FOREIGN KEY constraints to express cross-row and cross-table restrictions.

If what you desire is a one-time check against other rows at row insertion, rather than a continuously-maintained consistency guarantee, a custom trigger can be used to implement that. (This approach avoids the dump/reload problem because pg_dump does not reinstall triggers until after reloading data, so that the check will not be enforced during a dump/reload.)

The EXCLUDE constraint looks promising, but is primarily for cases where the test is not equality. And I'm not sure if I can include window functions in there.

So I'm left with a custom trigger but this seems like a bit of a hack for what seems like a fairly common use case.

Can anyone improve on using a trigger?

Ideally, I'd like to be able to just say:

INSERT ....
ON CONFLICT DO NOTHING

and have Postgres deal with the rest!

Minimum working example

BEGIN;
 CREATE TABLE new_samples (
 id INT GENERATED ALWAYS AS IDENTITY,
 experiment VARCHAR,
 sample_value INT
 );
 INSERT INTO new_samples(experiment, sample_value)
 VALUES
 ('A', 1),
 -- This is fine because they are for different groups
 ('B', 1),
 -- This is fine because the value has changed
 ('A', 2),
 -- This is fine because it's different to the previous value in
 -- experiment A.
 ('A', 1),
 -- Two is not allowed here because it's the same as the value
 -- before it, within this experiment.
 ('A', 1);
 SELECT
 *,
 lag(sample_value) OVER experiment_and_id
 FROM new_samples
 WINDOW experiment_and_id AS (
 PARTITION BY experiment
 ORDER BY id
 );
ROLLBACK;

Question 2

stackoverflow.com/questions/14221775/…

Question 3

Are you searching for a solution that works with concurrent insertions? The trigger you suggest has a race condition in this case. OTOH if you have a single session inserting, why bother doing this on the server? Cache the last value and deduplicate on the client.

Question 4

The solution needs to be stateless as far as the client is concerned, but there's no requirement for concurrency at the moment.

Question 5

The INSERT itself, without a trigger, can avoid inserting the same value as the last one for the same experiment in the order of IDs. For instance, if 1ドル is the experiment and 2ドル the value:

INSERT INTO new_samples(experiment,value)
SELECT 1,ドル 2ドル WHERE NOT EXISTS
 (SELECT 1 FROM new_samples
 WHERE id=(SELECT max(id) FROM new_samples WHERE experiment=1ドル)
 AND value=2ドル);

If there were multiple sessions that inserted concurrently for the same experiment, there would be a potential race condition, because the SELECT within the INSERT won't see a just-inserted row from a concurrent transaction that hasn't committed yet. But there is the same race condition with a trigger or a check constraint that tries to look beyond the row being modified or inserted.

Race conditions can be handled by using the serializable isolation level (auto-aborting conflicting transactions) or pessimistic locking (lock out other writers before doing the operation, effectively forcing a serialization).

score 1 · Answer 1 · 2020-09-04 14:18:59Z

The INSERT itself, without a trigger, can avoid inserting the same value as the last one for the same experiment in the order of IDs. For instance, if 1ドル is the experiment and 2ドル the value:

INSERT INTO new_samples(experiment,value)
SELECT 1,ドル 2ドル WHERE NOT EXISTS
 (SELECT 1 FROM new_samples
 WHERE id=(SELECT max(id) FROM new_samples WHERE experiment=1ドル)
 AND value=2ドル);

If there were multiple sessions that inserted concurrently for the same experiment, there would be a potential race condition, because the SELECT within the INSERT won't see a just-inserted row from a concurrent transaction that hasn't committed yet. But there is the same race condition with a trigger or a check constraint that tries to look beyond the row being modified or inserted.

Race conditions can be handled by using the serializable isolation level (auto-aborting conflicting transactions) or pessimistic locking (lock out other writers before doing the operation, effectively forcing a serialization).

Stack Exchange Network

Prevent consecutive duplicate values without a trigger

Minimum working example

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Prevent consecutive duplicate values without a trigger

Minimum working example

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions