0

I am trying to refactor a table which contains historical data and therefore isn't ever deleted, but has an 'active' field.

Each day we run a query (a few joins, but quite slow) to get some data to insert into this table:

  • if the row exists already, and is present in the query, we do nothing
  • if the row exists but the result isn't in todays query, we set 'active' to false
  • if the row doesn't exist but is in the query, we insert it (with active defaulting to true)

It is possible for an entity to be active, made inactive (same row), then a few days later become active again (would be a new row).

The table is using a surrogate key, as the natural key can be repeated as noted above, though there should only ever be a single entry that is active on the natural key.

I'm curious if there is an action or 'pattern' I can take that will enable such functionality - I realise I'm asking a lot :)

I am using a java library (jOOQ) to generate the sql, so I am only looking for sql as the solution, no PL/pgSQL and no stored procs. I doubt it is possible without 2 passes (1. perform the update where not in query result, 2. perform the insert from query result) but I'm intrigued if there are possibilities.

Edit: Example table history, where ** is an update to same row

pk_surrogate | natural_id | active| date_made_active
1 | 1 | true | 2020年01月01日 -- row for entity 1 added
1 ** | 1 | false | 2020年01月01日 -- row for entity 1 no longer present on 2020年01月02日, so marked inactive
2 | 1 | true | 2020年01月03日 -- new row for entity 1 added when it was present on 2020年01月03日
3 | 2 | true | 2020年01月03日 --new value for 2020年01月03日
asked Jul 28, 2020 at 20:07
2
  • Neither pk_surrogate nor natural_id seem to be primary (or unique) keys - is that sample data the source or the target? Commented Jul 29, 2020 at 7:26
  • pk_surrogate is primary key - it's unique per row - the row with ** is an update, didn't know how best to repesent an updated row Commented Jul 30, 2020 at 8:08

1 Answer 1

1

Something like this maybe?

with daily_query as (
 .... this is your current query ...
), upsert as (
 -- covers first and third requirement
 insert into the_table (....)
 select ...
 from daily_query
 on conflict (pk_column) do nothing
)
-- this sets those rows to in active that exist in the target table
-- but are not in the initial query
update the_table t
 set active = false
where not exists (select * 
 from daily_query q
 where q.pk_column = t.pk_column);
answered Jul 28, 2020 at 20:49
3
  • Thanks for your quick response @a_horse_with_no_name - in this example, is the pk_column the surrogate key or natural (but can repeat) key? In my model the pk would be the surrogate key and that's why I didn't think this would work, as that will always be unique. I must admit I don't think the check "unique(natural_key) where active = true" exists, but perhaps by adding it the above would also work... Commented Jul 29, 2020 at 6:16
  • It a primary or unique key. Obviously you need to have (at least) one, otherwise you can't detect "if the row exists". As you didn't show us your table definition I had to make something up Commented Jul 29, 2020 at 6:18
  • Thanks - I described with words but I should know better by now :) Have included sample table in the question, the unique constraint is on column 'natural_key' where active = true. It's the conditionality but I see now I can probably do on conflict (natural_key) where active = true do nothing if there is a unique index on that key - I'll give it a try. Thanks! Commented Jul 29, 2020 at 7:25

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.