Select Distinct on subset of columns, name different set of columns to return

Question 1

I am using PostgreSQL and have two tables:

Table_A
colA | colB | colC | colD | colE | colF | colG | colH | colI | colJ
Table_B
colA | colB | colC | columnD | colE | columnF | colG | colH | colI | colJ

I am trying to insert a number of rows from table_B into table_A. My problem is that Table_A has a primary key based on colA, colB, colC, colD, and colE. Table_B does not have this restriction, which means a simple insert won't work:

INSERT INTO Table_A (colA, colB, colC, colD, colE, colF, colG, colH, colI, colJ) 
SELECT colA, colB, colC, columnD, colE, columnF, colG, colH, colI, colJ FROM Table_B;

I am trying to work around this by using DISTINCT in my selection from Table_B. However, I cannot determine the correct syntax to both select distinct on the 5 primary key columns used in Table_A, and select all ten columns to be inserted. I have tried

INSERT INTO Table_A (colA, colB, colC, colD, colE) 
SELECT DISTINCT colA, colB, colC, columnD, colE FROM Table_B;

Which correctly pulls unique entries but does not populate columns F-J, and I have tried

INSERT INTO Table_A (colA, colB, colC, colD, colE, colF, colG, colH, colI, colJ) 
SELECT DISTINCT(colA, colB, colC, columnD, colE) colA, colB, colC, columnD, colE, columnF, colG, colH, colI, colJ FROM Table_B;

But this fails, as the first column entry is a wrapped version of the 5 unique columns, and fails the insert due to column length restrictions - the SELECT DISTINCT in parens returns '' which obviously doesn't fit:

ERROR: Value is too long for type character varying(12)

My goal would be form a query which gets all the unique combinations of colA, colB, colC, columnD, colE from Table_B, and inserts those full rows, including columnF, colG, colH, colI, colJ into Table_A.

Question 2

What the criterion should be for selecting one row (from the many that have the same unique key)?

Question 3

I don't know that I care - I just need at least one valid record. but is there a way to choose?

Question 4

You're almost there. SELECT DISTINCT returns that group of columns, as you've found out. If you use DISTINCT ON instead though, you'll get the rows you're looking for.

INSERT INTO Table_A 
 (colA, colB, colC, colD, colE, colF, colG, colH, colI, colJ) 
SELECT DISTINCT ON (colA, colB, colC, columnD, colE) 
 colA, colB, colC, columnD, colE, columnF, colG, colH, colI, colJ 
FROM
 Table_B

As pointed out by ypercube and in the Postgres docs, you can improve this query by adding ORDER BY. Without it, it seems that the choice between two conflicting rows is unpredictable.

INSERT INTO Table_A 
 (colA, colB, colC, colD, colE, colF, colG, colH, colI, colJ) 
SELECT DISTINCT ON (colA, colB, colC, columnD, colE) 
 colA, colB, colC, columnD, colE, columnF, colG, colH, colI, colJ 
FROM
 Table_B
ORDER BY
 colA, colB, colC, columnD, colE -- needed as it is
-- , colX, colY -- to choose which row to pick
 ;

Question 5

If Tom cares about order, yea, then 'order by' would be important, but if they just wants the first entry for whatever reason, I think this would work.

Question 6

Oh wow, didn't realize I was that close! This totally works (and the ORDER BY is neat, but unnecessary for what I need)

Question 7

@ypercube strange, I just tested this in Postgres and order by wasn't required.

Question 8

@ypercube that's definitely not true. I ran the query in postgres just now to confirm, and it ran successfully without order by.

Question 9

Yes my bad, sorry. If there is an order by, the first columns must match the columns of the distinct on, this is the restriction.

Stevie Stevie 681 silver badge5 bronze badges · Accepted Answer · 2015-06-26 05:50:47Z

You're almost there. SELECT DISTINCT returns that group of columns, as you've found out. If you use DISTINCT ON instead though, you'll get the rows you're looking for.

INSERT INTO Table_A 
 (colA, colB, colC, colD, colE, colF, colG, colH, colI, colJ) 
SELECT DISTINCT ON (colA, colB, colC, columnD, colE) 
 colA, colB, colC, columnD, colE, columnF, colG, colH, colI, colJ 
FROM
 Table_B

As pointed out by ypercube and in the Postgres docs, you can improve this query by adding ORDER BY. Without it, it seems that the choice between two conflicting rows is unpredictable.

INSERT INTO Table_A 
 (colA, colB, colC, colD, colE, colF, colG, colH, colI, colJ) 
SELECT DISTINCT ON (colA, colB, colC, columnD, colE) 
 colA, colB, colC, columnD, colE, columnF, colG, colH, colI, colJ 
FROM
 Table_B
ORDER BY
 colA, colB, colC, columnD, colE -- needed as it is
-- , colX, colY -- to choose which row to pick
 ;

If Tom cares about order, yea, then 'order by' would be important, but if they just wants the first entry for whatever reason, I think this would work.
Oh wow, didn't realize I was that close! This totally works (and the ORDER BY is neat, but unnecessary for what I need)
@ypercube strange, I just tested this in Postgres and order by wasn't required.
@ypercube that's definitely not true. I ran the query in postgres just now to confirm, and it ran successfully without order by.
Yes my bad, sorry. If there is an order by, the first columns must match the columns of the distinct on, this is the restriction.

Stack Exchange Network

Select Distinct on subset of columns, name different set of columns to return

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Select Distinct on subset of columns, name different set of columns to return

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions