How can I properly merge the rows of several table with default values?

Question 1

Problem

I have three tables that are structured like so:

t1

 id | count_1 
--------------
 1 | 9
 2 | 4 
 3 | 3

t2

 id | count_2 
--------------
 1 | 2
 3 | 3

t3

 id | count_3 
--------------
 1 | 1
 4 | 8

id is unique in each table. Note that not all ids occur in each table. Here is the SQL to create those tables if you'd like to test.

I'm trying to merge all those tables with a column for each count, defaulting to zero if there is no count for that particular id. Like this:

 id | count_1 | count_2 | count_3 
----------------------------------
 1 | 9 | 2 | 1 
 2 | 4 | 0 | 0 
 3 | 3 | 3 | 0 
 4 | 0 | 0 | 8

Attempt

I thought this was a natural use case for a full outer join, like this:

SELECT
 COALESCE(t1.id, t2.id, t3.id) as id,
 COALESCE(t1.count_1, 0) as count_1,
 COALESCE(t2.count_2, 0) as count_2,
 COALESCE(t3.count_3, 0) as count_3
FROM
 t1
FULL OUTER JOIN t2
 ON t1.id = t2.id
FULL OUTER JOIN t3
 ON t1.id = t3.id
ORDER BY id ASC;

But this returns a result with non unique ids, where each row is just a row from one of the original tables with zeroes filling in the remaining columns:

 id | count_1 | count_2 | count_3 
----------------------------------
 1 | 9 | 0 | 0 # <- should
 1 | 0 | 2 | 0 # <- be
 1 | 0 | 0 | 1 # <- one row
 2 | 4 | 0 | 0 
 3 | 3 | 0 | 0 # <- should also be
 3 | 0 | 3 | 0 # <- one row
 4 | 0 | 0 | 8

Evidently I don't understand outer joins as well as I thought I did. Can anyone show me the correct way to do this?

Question 2

The specific data and query would give 4 rows in the result, not 7.

Question 3

FYI - I tried your code in a SQLFiddle. The only change I made was adding the commas after your fields in your select. I actually got exactly the results you wanted - one line per ID, with the appropriate counts in the appropriate columns. However, this one, with counts for id = 5 in tables 2 and 3 only, shows the problem. @ypercubeTM 's solution (COALESCE the t1 and t2 ids) does resolve it.

Question 4

That's interesting. That outer join query in my attempt does seem to be working with this simpler example. Interesting. I'll need to more carefully compare this tiny example and the actual production query where I saw the undesired behaviour.

Question 5

You could use FULL JOIN but the code gets a bit messy - at least for my taste. With 3 tables it's not so bad, you'd only need to change:

FULL OUTER JOIN t3
 ON t1.id = t3.id

to:

FULL OUTER JOIN t3
 ON COALESCE(t1.id, t2.id) = t3.id

but with more tables, it gets rather ugly. The other option is to gather all distinct id values and then LEFT JOIN all the tables:

SELECT
 d.id,
 COALESCE(t1.count_1, 0) AS count_1,
 COALESCE(t2.count_2, 0) AS count_2,
 COALESCE(t3.count_3, 0) AS count_3
FROM
 ( SELECT id FROM t1
 UNION
 SELECT id FROM t2
 UNION
 SELECT id FROM t3
 ) AS d
 LEFT JOIN t1 ON t1.id = d.id
 LEFT JOIN t2 ON t2.id = d.id
 LEFT JOIN t3 ON t3.id = d.id
ORDER BY id ;

Question 6

This works perfectly. Exactly the desired result.

Question 7

In the code you've provided, you have

SELECT
 COALESCE(t1.id, t2.id, t3.id) as id,
 COALESCE(t1.count_1, 0) as count_1,
 COALESCE(t2.count_2, 0) as count_2,
 COALESCE(t3.count_3, 0) as count_3
FROM
 t1
FULL OUTER JOIN t2
 ON t1.id = t2.id
FULL OUTER JOIN t3
 ON t1.id = t3.id
ORDER BY id ASC;

This is somewhat fine. However, this will produce a Cartesian product if you have two rows with the same id. For instance, this is fine, returning one row.

SELECT *
FROM ( VALUES (1,2) ) AS t(id,count_1)
INNER JOIN ( VALUES (1,3) ) AS g(id,count_2)
 USING (id);

But the addition of (1,7) to g causes two rows to be rendered here,

SELECT *
FROM ( VALUES (1,2) ) AS t(id,count_1)
INNER JOIN ( VALUES (1,3),(1,7) ) AS g(id,count_2)
 USING (id);

If this is what you're seeing then you have two rows with the same id in one of those three tables you're joining.

You now have to ask,

In which table do I have duplicate ids? You can find this out with a simple GROUP BY (id) HAVING count(*) > 1.
Which row's count_x do I want in my final output?

Question 8

From the question: "id is unique in each table."

ypercubeTM ypercubeTM 99.7k13 gold badges217 silver badges306 bronze badges · Accepted Answer · 2017-05-16 20:50:55Z

You could use FULL JOIN but the code gets a bit messy - at least for my taste. With 3 tables it's not so bad, you'd only need to change:

FULL OUTER JOIN t3
 ON t1.id = t3.id

to:

FULL OUTER JOIN t3
 ON COALESCE(t1.id, t2.id) = t3.id

but with more tables, it gets rather ugly. The other option is to gather all distinct id values and then LEFT JOIN all the tables:

SELECT
 d.id,
 COALESCE(t1.count_1, 0) AS count_1,
 COALESCE(t2.count_2, 0) AS count_2,
 COALESCE(t3.count_3, 0) AS count_3
FROM
 ( SELECT id FROM t1
 UNION
 SELECT id FROM t2
 UNION
 SELECT id FROM t3
 ) AS d
 LEFT JOIN t1 ON t1.id = d.id
 LEFT JOIN t2 ON t2.id = d.id
 LEFT JOIN t3 ON t3.id = d.id
ORDER BY id ;

This works perfectly. Exactly the desired result.

kdbanman
– kdbanman

2017年05月16日 22:19:57 +00:00
Commented May 16, 2017 at 22:19

Stack Exchange Network

How can I properly merge the rows of several table with default values?

Problem

Attempt

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

How can I properly merge the rows of several table with default values?

Problem

Attempt

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions