Within a group, I'd like to prevent INSERT
s of consecutive duplicate values, where "consecutive" is defined by a simple ORDER BY
clause.
Imagine a set of experiments which is regularly sampling values from a sensor. We only want to insert a value if it is new for that experiment.
Note that older values are allowed to be duplicates. So this is allowed:
id experiment value
1 A 10
2 A 20
3 A 10
but this is not:
id experiment value
1 A 10
2 A 10
I know how to find the previous value per experiment:
SELECT
*,
lag(sample_value) OVER experiment_and_id
FROM new_samples
WINDOW experiment_and_id AS (
PARTITION BY experiment
ORDER BY id
);
From the docs I know that CHECK
constraints are not allowed to use other rows in their checking:
PostgreSQL does not support CHECK constraints that reference table data other than the new or updated row being checked. While a CHECK constraint that violates this rule may appear to work in simple tests, it cannot guarantee that the database will not reach a state in which the constraint condition is false (due to subsequent changes of the other row(s) involved). This would cause a database dump and reload to fail. The reload could fail even when the complete database state is consistent with the constraint, due to rows not being loaded in an order that will satisfy the constraint. If possible, use UNIQUE, EXCLUDE, or FOREIGN KEY constraints to express cross-row and cross-table restrictions.
If what you desire is a one-time check against other rows at row insertion, rather than a continuously-maintained consistency guarantee, a custom trigger can be used to implement that. (This approach avoids the dump/reload problem because pg_dump does not reinstall triggers until after reloading data, so that the check will not be enforced during a dump/reload.)
The EXCLUDE
constraint looks promising, but is primarily for cases where the test is not equality. And I'm not sure if I can include window functions in there.
So I'm left with a custom trigger but this seems like a bit of a hack for what seems like a fairly common use case.
Can anyone improve on using a trigger?
Ideally, I'd like to be able to just say:
INSERT ....
ON CONFLICT DO NOTHING
and have Postgres deal with the rest!
Minimum working example
BEGIN;
CREATE TABLE new_samples (
id INT GENERATED ALWAYS AS IDENTITY,
experiment VARCHAR,
sample_value INT
);
INSERT INTO new_samples(experiment, sample_value)
VALUES
('A', 1),
-- This is fine because they are for different groups
('B', 1),
-- This is fine because the value has changed
('A', 2),
-- This is fine because it's different to the previous value in
-- experiment A.
('A', 1),
-- Two is not allowed here because it's the same as the value
-- before it, within this experiment.
('A', 1);
SELECT
*,
lag(sample_value) OVER experiment_and_id
FROM new_samples
WINDOW experiment_and_id AS (
PARTITION BY experiment
ORDER BY id
);
ROLLBACK;
-
stackoverflow.com/questions/14221775/…Walfrat– Walfrat2020年09月03日 13:58:56 +00:00Commented Sep 3, 2020 at 13:58
-
Are you searching for a solution that works with concurrent insertions? The trigger you suggest has a race condition in this case. OTOH if you have a single session inserting, why bother doing this on the server? Cache the last value and deduplicate on the client.Daniel Vérité– Daniel Vérité2020年09月03日 14:05:38 +00:00Commented Sep 3, 2020 at 14:05
-
The solution needs to be stateless as far as the client is concerned, but there's no requirement for concurrency at the moment.LondonRob– LondonRob2020年09月03日 18:38:28 +00:00Commented Sep 3, 2020 at 18:38
1 Answer 1
The INSERT itself, without a trigger, can avoid inserting the same value as the last one for the same experiment in the order of IDs. For instance, if 1ドル
is the experiment and 2ドル
the value:
INSERT INTO new_samples(experiment,value)
SELECT 1,ドル 2ドル WHERE NOT EXISTS
(SELECT 1 FROM new_samples
WHERE id=(SELECT max(id) FROM new_samples WHERE experiment=1ドル)
AND value=2ドル);
If there were multiple sessions that inserted concurrently for the same experiment, there would be a potential race condition, because the SELECT within the INSERT won't see a just-inserted row from a concurrent transaction that hasn't committed yet. But there is the same race condition with a trigger or a check constraint that tries to look beyond the row being modified or inserted.
Race conditions can be handled by using the serializable isolation level (auto-aborting conflicting transactions) or pessimistic locking (lock out other writers before doing the operation, effectively forcing a serialization).