How to group rows while filtering out values per each group?

Question 1

Imagine you have the following setup:

CREATE TABLE namechanges (
 id text,
 new_name text,
 change_date timestamp
);
INSERT INTO namechanges VALUES
 ('x', 'alice', '2020-03-01'),
 ('y', 'Bob T.', '2020-03-03'),
 ('x', 'Alice', '2020-03-05'),
 ('z', 'charlie', '2020-03-07'),
 ('x', 'Alice C.', '2020-03-09'),
 ('z', 'Charlie Z.', '2020-03-11')

How would look like a query that would retrieve just the current name for each id and return the following?

| id | max |
| --- | ---------- |
| z | Charlie Z. |
| y | Bob T. |
| x | Alice C. |

Here's the example above on DB Fiddle if you want to play with it: https://www.db-fiddle.com/f/ugNcXhRyb44KTpqjFXmKDW/0

Question 2

A much better way of doing what you want to do is to use the ROW_NUMBER() Analytic function as follows (see fiddle here). It avoids the "workaround" of concatenating strings and these functions are great - see below.

SELECT rn, id, new_name, change_date 
FROM 
(
 SELECT 
 ROW_NUMBER() OVER (PARTITION BY id ORDER BY id, change_date DESC) as rn,
 id, new_name, change_date
 FROM namechanges
 ORDER BY id, change_date DESC
) AS tab
WHERE rn = 1
ORDER BY id, change_date;

Result:

rn id new_name change_date
1 x Alice C. 2020年03月09日 00:00:00
1 y Bob T. 2020年03月03日 00:00:00
1 z Charlie Z. 2020年03月11日 00:00:00

Analytic (aka Window) functions are very powerful and will reward, a gazillion fold, time spent learning them! I've left the steps in the logic that I followed in the fiddle, so that (plus the tutorial link above) should give you a good start.

Question 3

In Postgres this can be solved using distinct on ()

select distinct on (id) *
from namechanges
order by id, change_date desc;

This is typically faster than using aggregation or Window functions.

Online example

Question 4

True - your solution - Planning Time: 0.155 ms Execution Time: 0.118 ms and [mine]( dbfiddle.uk/…) - Planning Time: 0.222 ms Execution Time: 0.281 ms - yours is ~ > 50% better! Nice one (+1) but, my solution is standards compliant! :-)

Question 5

@Vérace: yes, absolutely. It's also more versatile as it can easily be extended to "the two most recent changes" or similar variatioins.

Question 6

I've found a workaround that seems to work for the case above on PostgreSQL:

SELECT id, max((change_date || '-' || new_name))
FROM namechanges
GROUP BY id;

That will return the last name, although prefixed with the date. I'm not sure if this is the correct way of doing it and how performant it is.

Vérace Vérace 31k9 gold badges73 silver badges86 bronze badges · Accepted Answer · 2020-03-16 11:15:33Z

A much better way of doing what you want to do is to use the ROW_NUMBER() Analytic function as follows (see fiddle here). It avoids the "workaround" of concatenating strings and these functions are great - see below.

SELECT rn, id, new_name, change_date 
FROM 
(
 SELECT 
 ROW_NUMBER() OVER (PARTITION BY id ORDER BY id, change_date DESC) as rn,
 id, new_name, change_date
 FROM namechanges
 ORDER BY id, change_date DESC
) AS tab
WHERE rn = 1
ORDER BY id, change_date;

Result:

rn id new_name change_date
1 x Alice C. 2020年03月09日 00:00:00
1 y Bob T. 2020年03月03日 00:00:00
1 z Charlie Z. 2020年03月11日 00:00:00

Analytic (aka Window) functions are very powerful and will reward, a gazillion fold, time spent learning them! I've left the steps in the logic that I followed in the fiddle, so that (plus the tutorial link above) should give you a good start.

Stack Exchange Network

How to group rows while filtering out values per each group?

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

How to group rows while filtering out values per each group?

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions