I am trying to refactor a table which contains historical data and therefore isn't ever deleted, but has an 'active' field.
Each day we run a query (a few joins, but quite slow) to get some data to insert into this table:
- if the row exists already, and is present in the query, we do nothing
- if the row exists but the result isn't in todays query, we set 'active' to false
- if the row doesn't exist but is in the query, we insert it (with active defaulting to true)
It is possible for an entity to be active, made inactive (same row), then a few days later become active again (would be a new row).
The table is using a surrogate key, as the natural key can be repeated as noted above, though there should only ever be a single entry that is active on the natural key.
I'm curious if there is an action or 'pattern' I can take that will enable such functionality - I realise I'm asking a lot :)
I am using a java library (jOOQ) to generate the sql, so I am only looking for sql as the solution, no PL/pgSQL and no stored procs. I doubt it is possible without 2 passes (1. perform the update where not in query result, 2. perform the insert from query result) but I'm intrigued if there are possibilities.
Edit: Example table history, where ** is an update to same row
pk_surrogate | natural_id | active| date_made_active
1 | 1 | true | 2020年01月01日 -- row for entity 1 added
1 ** | 1 | false | 2020年01月01日 -- row for entity 1 no longer present on 2020年01月02日, so marked inactive
2 | 1 | true | 2020年01月03日 -- new row for entity 1 added when it was present on 2020年01月03日
3 | 2 | true | 2020年01月03日 --new value for 2020年01月03日
1 Answer 1
Something like this maybe?
with daily_query as (
.... this is your current query ...
), upsert as (
-- covers first and third requirement
insert into the_table (....)
select ...
from daily_query
on conflict (pk_column) do nothing
)
-- this sets those rows to in active that exist in the target table
-- but are not in the initial query
update the_table t
set active = false
where not exists (select *
from daily_query q
where q.pk_column = t.pk_column);
-
Thanks for your quick response @a_horse_with_no_name - in this example, is the pk_column the surrogate key or natural (but can repeat) key? In my model the pk would be the surrogate key and that's why I didn't think this would work, as that will always be unique. I must admit I don't think the check "unique(natural_key) where active = true" exists, but perhaps by adding it the above would also work...Mitch Kent– Mitch Kent2020年07月29日 06:16:53 +00:00Commented Jul 29, 2020 at 6:16
-
It a primary or unique key. Obviously you need to have (at least) one, otherwise you can't detect "if the row exists". As you didn't show us your table definition I had to make something upuser1822– user18222020年07月29日 06:18:38 +00:00Commented Jul 29, 2020 at 6:18
-
Thanks - I described with words but I should know better by now :) Have included sample table in the question, the unique constraint is on column 'natural_key' where active = true. It's the conditionality but I see now I can probably do
on conflict (natural_key) where active = true do nothing
if there is a unique index on that key - I'll give it a try. Thanks!Mitch Kent– Mitch Kent2020年07月29日 07:25:47 +00:00Commented Jul 29, 2020 at 7:25
pk_surrogate
nornatural_id
seem to be primary (or unique) keys - is that sample data the source or the target?