0

Given two results consisting of single key|value pair (coming from CTEs), I want to join and group them by key, aggregate their values and return two different things:

a) those keys where aggregated list of values in first resultset exactly matches aggregated list of values in second resultset

b) those keys where aggregated list of values in first resultset matches second resultset independent of order

I know of string_agg(), but it seems I can use it in the SELECT list, and it's inefficient anyway. Is there something more efficient?

Set 1

|key | value |
|-----|-------|
| 1 | 1 |
| 1 | 2 |
| 3 | 4 |
| 2 | 5 |
| 2 | 7 |
| 1 | 3 |

Set 2

|key | value |
|-----|-------|
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 2 | 7 |
| 2 | 5 |
| 4 | 6 |

Desired result:

a) key 1

(1,2,3 = 1,2,3)

b) key 1 and key 2

(5,7 = 7,5)

Erwin Brandstetter
186k28 gold badges463 silver badges636 bronze badges
asked May 10, 2018 at 0:23

2 Answers 2

0

For b) you can use intersection to find out what tuples that are in both resultsets, and then aggregate on top of that:

with rs1 (key, value) as ( values (1,1),(1,2),(1,3),(2,5),(2,7),(3,4))
 , rs2 (key, value) as ( values (1,1),(1,2),(1,3),(2,5),(2,7),(4,6))
select key, array_agg(value)
from (
 select key, value from rs1
 intersect
 select key, value from rs2
) t
group by key;
1 {3,2,1}
2 {7,5}

If you are dealing with bags instead of sets, you can use intersect all to preserve duplicates:

with rs1 (key, value) as ( values (1,1),(1,2),(1,1),(4,6))
 , rs2 (key, value) as ( values (1,1),(1,2),(1,1),(5,7))
select key, array_agg(value)
from (
 select key, value from rs1
 intersect all
 select key, value from rs2
) t
group by key;
1 {2,1,1}

a) does not really make sense, since there is no order to take into concideration. We can create one by adding an ordering number n for each key

with rs1 (key, n, value) as ( values (1,1,1),(1,2,2),(1,3,3),(2,1,5),(2,2,7),(3,1,4))
 , rs2 (key, n, value) as ( values (1,1,1),(1,3,2),(1,2,3),(2,2,5),(2,1,7),(4,1,6))
select key, array_agg(value)
from (
 select key, n, value from rs1
 intersect
 select key, n, value from rs2
) t
group by key;
1 {1}

Another possible interpreation is that n is a total ordering (i.e. not within each key). The same solution can be used to deal with that.

answered May 10, 2018 at 0:43
0

You need to define the order of rows somehow. (A given set has no natural order.) I added an ordering column ord. You might achieve that in your CTEs with row_number(), or you actually have additional columns establishing order.

Also assuming no duplicates on (key, value) and no null values. Else you have to define how to deal with those.

WITH t1 (ord, key, value) AS (
 VALUES (1,1,1),(2,1,2),(3,3,4),(4,2,5),(5,2,7),(6,1,3)
 )
, t2 (ord, key, value) AS (
 VALUES (1,1,1),(2,1,2),(3,1,3),(4,2,7),(5,2,5),(6,4,6)
 )
SELECT key, a1.sort_arr1, a2.sort_arr2
 , a1.ord_arr1 = a2.ord_arr2 AS match_arr -- match in given order
 , a1.sort_arr1 = a2.sort_arr2 AS match_set -- match after ordering
FROM (
 SELECT key
 , array_agg(value ORDER BY ord) AS ord_arr1
 , array_agg(value ORDER BY value) AS sort_arr1
 FROM t1 -- ordered input!
 GROUP BY 1
 ) a1
JOIN (
 SELECT key
 , array_agg(value ORDER BY ord) AS ord_arr2
 , array_agg(value ORDER BY value) AS sort_arr2
 FROM t2 -- ordered input!
 GROUP BY 1
 ) a2 USING (key)
ORDER BY match_arr DESC;
key ord_arr1 ord_arr2 match_arr match_set
1 {1,2,3} {1,2,3} t t
2 {5,7} {7,5} f t

fiddle

answered Jul 19 at 23:48

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.