I have a single table (table_1) of schema below:
row_id identifier col1 col2 col3 col4 status
1 A 1 2 3 4
2 A 2 3 4 5
3 B 1 2 3 4
4 B 2 3 4 6
5 C 1 2 3 4
I want to join on identifier = A
such that any rows with identifier != A
but which match on values for col1, col2, col3
and col4
will update the status
column with values for any matched ('B,C
'), and another for mismatch (foo).
Note that the status column values need to update with the identifier(s) matched.
row_id identifier col1 col2 col3 col4 status
1 A 1 2 3 4 B,C
2 A 2 3 4 5 foo
3 B 1 2 3 4
4 B 2 3 4 6
5 C 1 2 3 4
I've used a single SELECT statement with multiple subqueries for each separate column to match to on 'A', but had to explicitly select other identifiers and resulted in errors.
Corrected row_id to show the primary key.
2 Answers 2
You can do it using a self-join and an aggregate function (string_agg
) to aggregate the identifier
s that are different from the current row as below.
The first and last lines initializes the status
column to NULL
and finalizes it to foo
if it isn't assigned, respectively:
UPDATE table_1 t1 SET status = NULL;
UPDATE table_1 t1
SET status = (
SELECT string_agg(t2.identifier,',') FROM table_1 t2
WHERE t2.identifier != t1.identifier
AND t1.col1 = t2.col1
AND t1.col2 = t2.col2
AND t1.col3 = t2.col3
AND t1.col4 = t2.col4
);
UPDATE table_1 t1 SET status = 'foo' WHERE status IS NULL;
update test_table t1
inner join (
select *,
(case
when status_value is null then 'foo'
else status_value
end) as to_update_status from
(select *, group_concat(case
when identifier='A' then null
else identifier
end
) as status_value from test_table
group by col1, col2, col3, col4) as t1 where identifier='A'
) t2 on t1.row_id = t2.row_id
and t1.identifier=t2.identifier
and t1.col1 = t2.col1
and t1.col2 = t2.col2
and t1.col3 = t2.col3
and t1.col4 = t2.col4
set t1.status = t2.to_update_status;
In the above query, test_table is the name of the table.
Consider changing the structure of your table. It should have a unique identifier.(From the name, the column row_id
should be unique, but it is not in the data you provided, so I had to put 5 additional checks in the join query). If your row has a unique identifier, you could select just the row_id
and to_update_status
in the subquery and simply check the row_id
You should also have an index on identifer
column and a multi-column index on col1, col2, col3, col4
.