3

I need a script to create a table . The problem is my select for creating the table contains equal values in some rows. I need to use distinct for these columns. The other columns' values can come from any matching row.

My current result table has data like this:

| CITY | STREET | STREET_NUM | VAL_X | VAL_Y |
------------------------------------------------------------------
| CityA | Street abc | 5 | 11.5 | 0.5 |
| CityA | Street abc | 5 | 15.4 | 1.8 |
| CityA | Street abc | 5 | 12.4 | 2.8 |
| CityB | Street xyz | 18 | 5.4 | 1.9 |
| CityB | Street xyz | 18 | 8.4 | 1.1 |
| CityC | Street klm | 55 | 9.6 | 0.8 |

But I need data like this:

| CITY | STREET | STREET_NUM | VAL_X | VAL_Y |
------------------------------------------------------------------
| CityA | Street abc | 5 | 11.5 | 0.5 |
| CityB | Street xyz | 18 | 5.4 | 1.9 |
| CityC | Street klm | 55 | 9.6 | 0.8 |

For columns city, street and street_num I need to apply distinct. val_x and val_y should be used anyone, for example first of that group with same city, street and street_num.

Can you give me advice how to edit this script?

asked Mar 12, 2017 at 10:24
5
  • 1
    Will there be "duplicates" than need elimination between the different tables (a row from A and another from B)? Commented Mar 12, 2017 at 11:20
  • I got your question wrong, I didn't notice that you were creating table, this is why my answer is irrelevant and I have deleted it. If you add your select query in the place of my_result table(refer to the fiddle) then it might work as you expected. And yes, my sample data was different. The values in my example were rounded up. Here is the fiddle. Commented Mar 12, 2017 at 12:53
  • 1
    1. This is ambiguous: city, street and street_num I need to apply distinct. Is that supposed to be DISTINCT ON (city, street, street_num) or DISTINCT ON (city) and DISTINCT ON (street) and DISTINCT ON (street_num) - in this case you need to define priorities or you get arbitrary results. Very different cases. 2. Do val_x and val_y have to come from the same (even if arbitrary) row? 3. We need table definitions showing data types and constraints and your version of Postgres to give the best answer. Commented Mar 13, 2017 at 1:20
  • @ErwinBrandstetter the answer to (2) is yes (was posted in a comment.) Commented Mar 13, 2017 at 10:38
  • 1
    @ypercubeTM: Ah, I see. Goes to demonstrate the conventional wisdom: all defining information should go into the question ... Commented Mar 13, 2017 at 13:52

2 Answers 2

3

"VAL_X" and "VAL_Y" chosen through some aggregate function

You should consider using GROUP BY for the columns whose values you consider that should be "distinct" (as a group), and, for the rest of columns, choose an appropriate aggregate function (for instance, MIN):

CREATE TABLE my_result AS 
SELECT
 city, street, streetnum, min(val_x) AS val_x, min(val_y) AS val_y
FROM
 tableA
WHERE
 true /* your condition goes here */ 
GROUP BY
 city, street, streetnum

If you need to put together values from several tables, UNION ALL of them before you GROUP BY:

CREATE TABLE my_result AS 
SELECT
 city, street, streetnum, min(val_x) AS val_x, min(val_y) AS val_y
FROM
 (
 SELECT city, street, streetnum, val_x, val_y FROM tableA
 UNION ALL
 SELECT city, street, streetnum, val_x, val_y FROM tableB
 UNION ALL
 SELECT city, street, streetnum, val_x, val_y FROM tableC
 ) AS s0
WHERE
 true /* your condition goes here */ 
GROUP BY
 city, street, streetnum ;

Using always "VAL_X" and "VAL_Y" from same row, using a WINDOW

If you need to make sure your values are always from the same row, the best way is to use a WINDOW in your query: PARTITION BY "CITY", "STREET", "STREET_NUM" and ORDER BY "VAL_X", "VAL_Y", and choose the first row of every partition.

You can do this with two steps:

1) Add the row_num() to every partition:

SELECT 
 *, 
 (row_number() OVER (PARTITION BY "CITY", "STREET", "STREET_NUM" ORDER BY "VAL_X", "VAL_Y")) AS rn
FROM 
 table_a
 | CITY | STREET | STREET_NUM | VAL_X | VAL_Y | rn |
 |-------|------------|------------|-------|-------|----|
 | CityA | Street abc | 5 | 11.5 | 0.5 | 1 |
 | CityA | Street abc | 5 | 12.4 | 2.8 | 2 |
 | CityA | Street abc | 5 | 15.4 | 1.8 | 3 |
 | CityB | Street xyz | 18 | 5.4 | 1.9 | 1 |
 | CityB | Street xyz | 18 | 8.4 | 1.1 | 2 |
 | CityC | Street klm | 55 | 9.6 | 0.8 | 1 |

2) At this point, choose only the rows WHERE rn=1 (and ORDER them, if necessary):

SELECT
 "CITY", "STREET", "STREET_NUM", "VAL_X", "VAL_Y"
FROM
 (
 SELECT 
 *, 
 (row_number() OVER (PARTITION BY "CITY", "STREET", "STREET_NUM" ORDER BY "VAL_X", "VAL_Y")) AS rn
 FROM 
 table_a
 ) AS table_a_grouped 
WHERE
 rn = 1
ORDER BY 
 "CITY", "STREET", "STREET_NUM" 

The result is:

| CITY | STREET | STREET_NUM | VAL_X | VAL_Y |
|-------|------------|------------|-------|-------|
| CityA | Street abc | 5 | 11.5 | 0.5 |
| CityB | Street xyz | 18 | 5.4 | 1.9 |
| CityC | Street klm | 55 | 9.6 | 0.8 |

You can see the example at SQLFiddle

answered Mar 12, 2017 at 14:39
3
  • but this should got me val_x from 1. record and val_y from last recors. I need this couple from anyone row, but must be from same record Commented Mar 12, 2017 at 14:43
  • You can use a FIRST aggregate function, as explained wiki.postgresql.org/wiki/First/last_(aggregate). You could also use WINDOWs. It would make much more sense if there were a concept of ORDER. Commented Mar 12, 2017 at 14:46
  • Added WINDOW example. Commented Mar 12, 2017 at 20:31
3

Assumptions:

  • Resulting rows shall be distinct on (city, street, street_num) i.e. the combination of these 3 columns shall be unique. Individual columns can have dupes.

  • val_x and val_y (and possibly more columns) shall come from the same source row (they make sense together, like coordinates / a geocode).

  • All columns are defined NOT NULL.
    NULL values are considered equal in by DISTINCT or DISTINCT ON (per SQL standard). So the query works with NULL values all the same. But if you add ORDER BY to define which row to pick from each set of dupes, pay attention if expressions can be NULL. See linked answer below for details.

UNION ALL multiple sources, then apply DISTINCT ON (city, street, street_num) on the derived table:

CREATE TABLE my_result AS 
SELECT DISTINCT ON (city, street, street_num)
 city, street, street_num, val_x, val_y -- more columns?
FROM (
 SELECT city, street, street_num, val_x, val_y -- more columns?
 FROM tableA
 WHERE ...
 UNION ALL -- here probably cheaper than UNION
 SELECT city, street, street_num, val_x, val_y -- more columns?
 FROM tableB
 WHERE ...
 ) sub;

Results as desired.

dbfiddle here

It's probably cheaper to use a simple UNION ALL instead of UNION - which would not be wrong but impose a separate sort or hash operation on each SELECT of the UNION query. Since we do that (and more) in the outer query anyway, UNION ALL is probably cheaper. (Test to verify.)

I did not add ORDER BY, since you explicitly stated:

The other columns' values can come from any matching row.

You get an arbitrary pick from each set of dupes. The result can change with every invocation.

Works in basically any Postgres version. There are query techniques with LATERAL joins, CTE or window functions that require a more modern version.

Detailed discussion of the technique, indexes and possible alternatives:

answered Mar 13, 2017 at 1:35
1
  • 1
    Yeah, that's exactly what I was thinking. Another possible solution - which I think could be efficient - but should be used only if there are duplicates only in each table and not across tables (as it would yield different results otherwise): SELECT DISTINCT ON (city, street, street_num) * FROM tableA UNION ALL SELECT DISTINCT ON (city, street, street_num) * FROM tableB ... ; Commented Mar 13, 2017 at 10:42

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.