Postgres how to update first row in the child table for all parent entries

Question 1

I have a table parent table workflow and child table workflow_task having one to many mapping.

workflow`

workID	workTitle
1800	vpInv1231
1801	vpInv1231

workflow_task

id	task_Type	workflow_id(fk)
1	null	1800
2	null	1800

Is there any way I can update all the first row in the child table with taskType as 'POC' and rest(from second to so on) as 'Appr'?.

Question 2

update all the first row How to define what row is "first"?

Question 3

task_type is the identifier but it was added now, for now only id's with min value are the identifier

Question 4

Possible question interpretation:

UPDATE workflow_task
SET task_Type = CASE WHEN workflow_task.id = first_row.id
 THEN 'POC'
 ELSE 'Appr'
 END
FROM ( SELECT MIN(id) id, workflow_id
 FROM workflow_task
 GROUP BY workflow_id ) first_row
WHERE workflow_task.workflow_id = first_row.workflow_id;

https://dbfiddle.uk/?rdbms=postgres_12&fiddle=08c60035f735ae557100dde8a7623008

workflow table not needed for this operation.

Question 5

Thanks a lot Akina

Question 6

I upvoted @Akina's answer because it was a very simple and elegant way of solving the particular problem posed by the OP.

I looked at the question again and noticed that there are (at least) five other ways of obtaining the same result - using window functions. Window functions (short PostgreSQL tutorial here) are an extremely powerful tool for querying databases and will repay any effort spent learning them 10 times over.

EDIT:

There are 6 ways of doing this using window functions - i.e. the LAG() function - see the fiddle here. As with ROW_NUMBER() &c. there are two ways of doing this - either "directly" or using the wf.id DESC in the ORDER BY part of the OVER() clause of the window function - see discussion below.

Analysis:

Before starting, it is worth mentioning that none of the methods (mine or @Akina's) will work without having either a PRIMARY KEY or a UNIQUE (and NOT NULL) constraint on both the (id, workflow_id) fields in the workflow_task table - otherwise there is no way of distinguishing between different records in the table.

So, my table definition is now (see the fiddle for all code below (and more) here):

CREATE TABLE workflow_task 
(
 id INT NOT NULL, 
 task_Type VARCHAR(255), 
 workflow_id VARCHAR(255) NOT NULL,
 CONSTRAINT wft_pk PRIMARY KEY (workflow_id, id)
);

For testing purposes, I also added more records as follows:

INSERT INTO workflow_task VALUES
(1, null, 1800),
(2, null, 1800),
(3, null, 1800),
(4, null, 1800),
(5, null, 1800),
(6, null, 1800),
(7, null, 1900), -- <<=== Note duplication of 7 as id here and below
(8, null, 1900), -- <<=== to make things tricky!
(9, null, 1900),
(10, null, 1900),
(11, null, 1900),
(12, null, 1900),
(7, null, 2000),
(8, null, 2000),
(9, null, 2000),
(10, null, 2000),
(11, null, 2000),
(12, null, 2000);

The value(s) of the id fields don't matter - they only have to be UNIQUE (and NOT NULL) in conjunction with the workflow_id field - as ensured by the PRIMARY KEY in my table definition.

The inital window function method is using the FIRST_VALUE() function as follows:

UPDATE workflow_task
SET task_type = 
 CASE
 WHEN workflow_task.id = fr.f_id THEN 'XXX'
 ELSE 'YYY'
 END
FROM
(
 SELECT FIRST_VALUE(wf.id) OVER (PARTITION BY wf.workflow_id 
 ORDER BY id, workflow_id) AS f_id,
 wf.workflow_id
 FROM workflow_task wf
) AS fr
WHERE workflow_task.workflow_id = fr.workflow_id;

Result:

id task_type workflow_id
1 XXX 1800
2 YYY 1800
3 YYY 1800
4 YYY 1800
5 YYY 1800
6 YYY 1800
7 XXX 1900
8 YYY 1900
...
... snipped for brevity 
...

The result is correct and matches @Akina's MIN(id) approach - apart from the differing values inserted for testing/visibility purposes.

The second approach makes use of the ROW_NUMBER() function.

Now, you may ask, why bother with this (these) approaches when you have a perfectly working solution - the answer lies in the power of window functions. This (relatively simple) question has a relatively simple answer - but if, down the road, a requirement arises for more sophisticated criteria to be taken into account - the window functions' approach will become the tool of choice.

For example, with the ROW_NUMBER() approach, you can choose the second id, or the third and so on...

SQL:

UPDATE workflow_task
SET task_type = 
 CASE
 WHEN workflow_task.id = fr.id THEN 'AAA'
 ELSE 'BBB'
 END
FROM
(
 SELECT ROW_NUMBER() OVER (PARTITION BY wf.workflow_id
 ORDER BY wf.id, wf.workflow_id) AS f_rn,
 wf.id,
 wf.workflow_id
 FROM workflow_task wf
) AS fr
WHERE workflow_task.workflow_id = fr.workflow_id
AND fr.f_rn = 1;

Result:

id task_type workflow_id
1 AAA 1800
2 BBB 1800
3 BBB 1800
...
... snipped for brevity
...

Again, this gives the correct answer.

This also works:

UPDATE workflow_task
SET task_type = 
 CASE
 WHEN workflow_task.id = fr.id THEN 'RRR'
 ELSE 'SSS'
 END
FROM
(
 SELECT ROW_NUMBER() OVER (PARTITION BY wf.workflow_id
 ORDER BY wf.id DESC, wf.workflow_id) AS f_rn,
 wf.id,
 wf.workflow_id
 FROM workflow_task wf
) AS fr
WHERE workflow_task.workflow_id = fr.workflow_id
-- AND fr.f_rn = 1;

Note the ORDER BY wf.id DESC (the DESC part) and I've commented out the fr.f_rn = 1 predicate.

The NTH_VALUE() function shows how these functions can be used in interesting ways - say you wanted to update the table on the 3rd record? So this will do it:

UPDATE workflow_task
SET task_type = 
 CASE
 WHEN workflow_task.id = fr.f_id THEN 'GGG'
 ELSE 'HHH'
 END
FROM
(
 SELECT NTH_VALUE(wf.id, 3) OVER (PARTITION BY wf.workflow_id 
 ORDER BY id, workflow_id) AS f_id,
 wf.workflow_id
 FROM workflow_task wf
) AS fr
WHERE workflow_task.workflow_id = fr.workflow_id;

Result:

id task_type workflow_id
1 HHH 1800
2 HHH 1800
3 GGG 1800 -- <<=== Note - 3rd record modified! 
4 HHH 1800
...
... snipped - 3rd record modified down the line
...

Of course, NTH_VALUE(wf.id, 1) reduced to FIRST_VALUE()!

I've also included the RANK() and DENSE_RANK() functions in the fiddle - the point is not to get the answer - but so that you can explore these functions which can be used in all sorts of imaginative ways to achieve non-trivial results relatively easily!

EDIT:

As discussed at the beginning of the answer, my final window function is the LAG() one - this involves the use of NULLs. I'll show how this works by starting with the inner SELECT:

SELECT LAG(wf.workflow_id) OVER (PARTITION BY wf.workflow_id
 ORDER BY wf.id, wf.workflow_id) AS f_rn,
wf.id,
wf.workflow_id
FROM workflow_task wf

Result:

f_rn id workflow_id
NULL 1 1800
1800 2 1800
1800 3 1800

and the final SQL:

UPDATE workflow_task
SET task_type = 
 CASE
 WHEN workflow_task.id = fr.id THEN 'LLL'
 ELSE 'MMM'
 END
FROM
(
 SELECT LAG(wf.workflow_id) OVER (PARTITION BY wf.workflow_id
 ORDER BY wf.id, wf.workflow_id) AS f_rn,
 wf.id,
 wf.workflow_id
 FROM workflow_task wf
) AS fr
WHERE workflow_task.workflow_id = fr.workflow_id
AND fr.f_rn IS NULL;

Result (correct - snipped for brevity):

id task_type workflow_id
1 LLL 1800
2 MMM 1800
3 MMM 1800

If you're inexperienced with these tools, I found it helpful to first consider them using aggregate functions (i.e. AVG(), MIN(), MAX(), SUM(), and COUNT()) as explained well here.

A final word of caution - check out the performance analyses at the end of the fiddle - for every up, there's a down (or "Yae cannae beet the law o' physics, Jim - apologies to Gene Rodenberry).

The MIN(id) query appears to be consistently faster than the window function ones. Having said that, it's impossible to tell definitively what will happen with more data - however, I would imagine that, as a general rule, @Akina's query would be faster. I would, however, advise you to test with your own system and data!

p.s. welcome to the forum!

Question 7

Akina's answer is flawless for your objective.
Vérace considered all the other options.
For completeness, EXISTS is another option:

UPDATE workflow_task wt
SET task_type = 'POC'
WHERE NOT EXISTS (
 SELECT FROM workflow_task wt1
 WHERE wt1.workflow_id = wt.workflow_id
 AND wt1.id < wt.id
 );
 
UPDATE workflow_task wt
SET task_type = 'Appr'
WHERE task_type IS NULL;

You could do it in a single UPDATE. But doing it in two steps has internal benefits. UPDATE writes a new row version. Updating all rows in a single command effectively doubles the size of a (pristine) table. If you do it in multiple steps, in separate transactions (and without concurrent load on the DB that would stand in the way), subsequent updates can reuse dead tuples, thereby keep table bloat at bay ..

That said, consider this ...

Alternative design

Store one pointer to the "poc" row in the parent table, instead of marking all rows in the child table.

CREATE TABLE workflow (
 workflow_id int PRIMARY KEY
, work_title text NOT NULL
, poc int -- can be null
);
CREATE TABLE workflow_task (
 workflow_task_id int PRIMARY KEY
, workflow_id int REFERENCES workflow
-- ... more columns
, UNIQUE (workflow_id, workflow_task_id)
);
ALTER TABLE workflow ADD CONSTRAINT workflow_poc_fk FOREIGN KEY (workflow_id, poc) REFERENCES workflow_task (workflow_id, workflow_task_id);

db<>fiddle here

In more detail:

Constraint to enforce "at least one" or "exactly one" in a database

Akina Akina 20.8k2 gold badges20 silver badges22 bronze badges · Answer 1 · 2021-04-27 10:02:59Z

Possible question interpretation:

UPDATE workflow_task
SET task_Type = CASE WHEN workflow_task.id = first_row.id
 THEN 'POC'
 ELSE 'Appr'
 END
FROM ( SELECT MIN(id) id, workflow_id
 FROM workflow_task
 GROUP BY workflow_id ) first_row
WHERE workflow_task.workflow_id = first_row.workflow_id;

https://dbfiddle.uk/?rdbms=postgres_12&fiddle=08c60035f735ae557100dde8a7623008

workflow table not needed for this operation.

Thanks a lot Akina

Azeem
– Azeem

2021年04月27日 10:13:38 +00:00
Commented Apr 27, 2021 at 10:13

Vérace Vérace 31k9 gold badges73 silver badges86 bronze badges · Answer 2 · 2021-04-27 14:55:31Z

I upvoted @Akina's answer because it was a very simple and elegant way of solving the particular problem posed by the OP.

I looked at the question again and noticed that there are (at least) five other ways of obtaining the same result - using window functions. Window functions (short PostgreSQL tutorial here) are an extremely powerful tool for querying databases and will repay any effort spent learning them 10 times over.

EDIT:

There are 6 ways of doing this using window functions - i.e. the LAG() function - see the fiddle here. As with ROW_NUMBER() &c. there are two ways of doing this - either "directly" or using the wf.id DESC in the ORDER BY part of the OVER() clause of the window function - see discussion below.

Analysis:

Before starting, it is worth mentioning that none of the methods (mine or @Akina's) will work without having either a PRIMARY KEY or a UNIQUE (and NOT NULL) constraint on both the (id, workflow_id) fields in the workflow_task table - otherwise there is no way of distinguishing between different records in the table.

So, my table definition is now (see the fiddle for all code below (and more) here):

CREATE TABLE workflow_task 
(
 id INT NOT NULL, 
 task_Type VARCHAR(255), 
 workflow_id VARCHAR(255) NOT NULL,
 CONSTRAINT wft_pk PRIMARY KEY (workflow_id, id)
);

For testing purposes, I also added more records as follows:

INSERT INTO workflow_task VALUES
(1, null, 1800),
(2, null, 1800),
(3, null, 1800),
(4, null, 1800),
(5, null, 1800),
(6, null, 1800),
(7, null, 1900), -- <<=== Note duplication of 7 as id here and below
(8, null, 1900), -- <<=== to make things tricky!
(9, null, 1900),
(10, null, 1900),
(11, null, 1900),
(12, null, 1900),
(7, null, 2000),
(8, null, 2000),
(9, null, 2000),
(10, null, 2000),
(11, null, 2000),
(12, null, 2000);

The value(s) of the id fields don't matter - they only have to be UNIQUE (and NOT NULL) in conjunction with the workflow_id field - as ensured by the PRIMARY KEY in my table definition.

The inital window function method is using the FIRST_VALUE() function as follows:

UPDATE workflow_task
SET task_type = 
 CASE
 WHEN workflow_task.id = fr.f_id THEN 'XXX'
 ELSE 'YYY'
 END
FROM
(
 SELECT FIRST_VALUE(wf.id) OVER (PARTITION BY wf.workflow_id 
 ORDER BY id, workflow_id) AS f_id,
 wf.workflow_id
 FROM workflow_task wf
) AS fr
WHERE workflow_task.workflow_id = fr.workflow_id;

Result:

id task_type workflow_id
1 XXX 1800
2 YYY 1800
3 YYY 1800
4 YYY 1800
5 YYY 1800
6 YYY 1800
7 XXX 1900
8 YYY 1900
...
... snipped for brevity 
...

The result is correct and matches @Akina's MIN(id) approach - apart from the differing values inserted for testing/visibility purposes.

The second approach makes use of the ROW_NUMBER() function.

Now, you may ask, why bother with this (these) approaches when you have a perfectly working solution - the answer lies in the power of window functions. This (relatively simple) question has a relatively simple answer - but if, down the road, a requirement arises for more sophisticated criteria to be taken into account - the window functions' approach will become the tool of choice.

For example, with the ROW_NUMBER() approach, you can choose the second id, or the third and so on...

SQL:

UPDATE workflow_task
SET task_type = 
 CASE
 WHEN workflow_task.id = fr.id THEN 'AAA'
 ELSE 'BBB'
 END
FROM
(
 SELECT ROW_NUMBER() OVER (PARTITION BY wf.workflow_id
 ORDER BY wf.id, wf.workflow_id) AS f_rn,
 wf.id,
 wf.workflow_id
 FROM workflow_task wf
) AS fr
WHERE workflow_task.workflow_id = fr.workflow_id
AND fr.f_rn = 1;

Result:

id task_type workflow_id
1 AAA 1800
2 BBB 1800
3 BBB 1800
...
... snipped for brevity
...

Again, this gives the correct answer.

This also works:

UPDATE workflow_task
SET task_type = 
 CASE
 WHEN workflow_task.id = fr.id THEN 'RRR'
 ELSE 'SSS'
 END
FROM
(
 SELECT ROW_NUMBER() OVER (PARTITION BY wf.workflow_id
 ORDER BY wf.id DESC, wf.workflow_id) AS f_rn,
 wf.id,
 wf.workflow_id
 FROM workflow_task wf
) AS fr
WHERE workflow_task.workflow_id = fr.workflow_id
-- AND fr.f_rn = 1;

Note the ORDER BY wf.id DESC (the DESC part) and I've commented out the fr.f_rn = 1 predicate.

The NTH_VALUE() function shows how these functions can be used in interesting ways - say you wanted to update the table on the 3rd record? So this will do it:

UPDATE workflow_task
SET task_type = 
 CASE
 WHEN workflow_task.id = fr.f_id THEN 'GGG'
 ELSE 'HHH'
 END
FROM
(
 SELECT NTH_VALUE(wf.id, 3) OVER (PARTITION BY wf.workflow_id 
 ORDER BY id, workflow_id) AS f_id,
 wf.workflow_id
 FROM workflow_task wf
) AS fr
WHERE workflow_task.workflow_id = fr.workflow_id;

Result:

id task_type workflow_id
1 HHH 1800
2 HHH 1800
3 GGG 1800 -- <<=== Note - 3rd record modified! 
4 HHH 1800
...
... snipped - 3rd record modified down the line
...

Of course, NTH_VALUE(wf.id, 1) reduced to FIRST_VALUE()!

I've also included the RANK() and DENSE_RANK() functions in the fiddle - the point is not to get the answer - but so that you can explore these functions which can be used in all sorts of imaginative ways to achieve non-trivial results relatively easily!

EDIT:

As discussed at the beginning of the answer, my final window function is the LAG() one - this involves the use of NULLs. I'll show how this works by starting with the inner SELECT:

SELECT LAG(wf.workflow_id) OVER (PARTITION BY wf.workflow_id
 ORDER BY wf.id, wf.workflow_id) AS f_rn,
wf.id,
wf.workflow_id
FROM workflow_task wf

Result:

f_rn id workflow_id
NULL 1 1800
1800 2 1800
1800 3 1800

and the final SQL:

UPDATE workflow_task
SET task_type = 
 CASE
 WHEN workflow_task.id = fr.id THEN 'LLL'
 ELSE 'MMM'
 END
FROM
(
 SELECT LAG(wf.workflow_id) OVER (PARTITION BY wf.workflow_id
 ORDER BY wf.id, wf.workflow_id) AS f_rn,
 wf.id,
 wf.workflow_id
 FROM workflow_task wf
) AS fr
WHERE workflow_task.workflow_id = fr.workflow_id
AND fr.f_rn IS NULL;

Result (correct - snipped for brevity):

id task_type workflow_id
1 LLL 1800
2 MMM 1800
3 MMM 1800

If you're inexperienced with these tools, I found it helpful to first consider them using aggregate functions (i.e. AVG(), MIN(), MAX(), SUM(), and COUNT()) as explained well here.

A final word of caution - check out the performance analyses at the end of the fiddle - for every up, there's a down (or "Yae cannae beet the law o' physics, Jim - apologies to Gene Rodenberry).

The MIN(id) query appears to be consistently faster than the window function ones. Having said that, it's impossible to tell definitively what will happen with more data - however, I would imagine that, as a general rule, @Akina's query would be faster. I would, however, advise you to test with your own system and data!

p.s. welcome to the forum!

score 0 · Answer 3 · 2021-04-27 16:43:24Z

Akina's answer is flawless for your objective.
Vérace considered all the other options.
For completeness, EXISTS is another option:

UPDATE workflow_task wt
SET task_type = 'POC'
WHERE NOT EXISTS (
 SELECT FROM workflow_task wt1
 WHERE wt1.workflow_id = wt.workflow_id
 AND wt1.id < wt.id
 );
 
UPDATE workflow_task wt
SET task_type = 'Appr'
WHERE task_type IS NULL;

You could do it in a single UPDATE. But doing it in two steps has internal benefits. UPDATE writes a new row version. Updating all rows in a single command effectively doubles the size of a (pristine) table. If you do it in multiple steps, in separate transactions (and without concurrent load on the DB that would stand in the way), subsequent updates can reuse dead tuples, thereby keep table bloat at bay ..

That said, consider this ...

Alternative design

Store one pointer to the "poc" row in the parent table, instead of marking all rows in the child table.

CREATE TABLE workflow (
 workflow_id int PRIMARY KEY
, work_title text NOT NULL
, poc int -- can be null
);
CREATE TABLE workflow_task (
 workflow_task_id int PRIMARY KEY
, workflow_id int REFERENCES workflow
-- ... more columns
, UNIQUE (workflow_id, workflow_task_id)
);
ALTER TABLE workflow ADD CONSTRAINT workflow_poc_fk FOREIGN KEY (workflow_id, poc) REFERENCES workflow_task (workflow_id, workflow_task_id);

db<>fiddle here

In more detail:

Constraint to enforce "at least one" or "exactly one" in a database

Stack Exchange Network

Postgres how to update first row in the child table for all parent entries

3 Answers 3

EDIT:

Analysis:

EDIT:

Alternative design

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Postgres how to update first row in the child table for all parent entries

3 Answers 3

EDIT:

Analysis:

EDIT:

Alternative design

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions