List distinct column values where those rows share other column values

Question 1

I'm trying to write a query to detect possibly-invalid data in a PostgreSQL table. We have a table of city names like this:

# `city_names`
id | name | language | dialect | city_id
------------------------------------------------
01 | London | A | A1 | 1
02 | London | A | A2 | 1
03 | London | B | B1 | 2
04 | London | B | B2 | 3

In our domain:

It's fine that rows 01 and 02 both map "London" to city 1; the dialects don't happen to differ
It's fine that row 03 maps "London" to city 2; in that language, the name may refer to a different city
It's suspicious that row 04 maps "London" to city 3, because we already have a mapping to city 2 in the same language

I want to write a query that selects only rows 03 and 04 so that a human can decide whether one of them points to the wrong city.

I can solve this problem procedurally, but I'm having trouble doing it in SQL. For example, if I GROUP BY language and name, I lose the city_id values from the individual rows.

Basically my goal is: "If there's more than one city_id for the same name and language, list those city_ids."

How can I do this?

Question 2

A procedural solution is: make a hash. For each row, ensure the hash has a key with its name and language, like "London-A", and a value for the set of all city_ids matching that. Insert the row's city id. At the end, list the hash keys where the set has more than one item. (Of course, this is horribly slow.)

Question 3

This query will do. The trick is to use COUNT(DISTINCT city_id):

SQL Fiddle

PostgreSQL 8.3.20 Schema Setup:

CREATE TABLE Table1
 ("id" int, "name" varchar(6), "language" varchar(1), "dialect" varchar(2), "city_id" int)
;
INSERT INTO Table1
 ("id", "name", "language", "dialect", "city_id")
VALUES
 (01, 'London', 'A', 'A1', 1),
 (02, 'London', 'A', 'A2', 1),
 (03, 'London', 'B', 'B1', 2),
 (04, 'London', 'B', 'B2', 3)
;

Query 1:

select t.*, d.dups
FROM table1 t INNER JOIN
(
 select name, language, count(distinct city_id) as dups
 from table1
 group by name, language
 having count(distinct city_id) > 1
) d
ON t.name = d.name and t.language = d.language

Results :

| ID | NAME | LANGUAGE | DIALECT | CITY_ID | DUPS |
|----|--------|----------|---------|---------|------|
| 3 | London | B | B1 | 2 | 2 |
| 4 | London | B | B2 | 3 | 2 |

Question 4

Awesome! I ended up changing this slightly to create a temporary table suspicious_city_names for the inner query, then join to it. I think this makes the query a bit easier to follow. Thanks for your help!

cha cha 1,0555 silver badges7 bronze badges · Accepted Answer · 2014-09-29 22:17:14Z

This query will do. The trick is to use COUNT(DISTINCT city_id):

SQL Fiddle

PostgreSQL 8.3.20 Schema Setup:

CREATE TABLE Table1
 ("id" int, "name" varchar(6), "language" varchar(1), "dialect" varchar(2), "city_id" int)
;
INSERT INTO Table1
 ("id", "name", "language", "dialect", "city_id")
VALUES
 (01, 'London', 'A', 'A1', 1),
 (02, 'London', 'A', 'A2', 1),
 (03, 'London', 'B', 'B1', 2),
 (04, 'London', 'B', 'B2', 3)
;

Query 1:

select t.*, d.dups
FROM table1 t INNER JOIN
(
 select name, language, count(distinct city_id) as dups
 from table1
 group by name, language
 having count(distinct city_id) > 1
) d
ON t.name = d.name and t.language = d.language

Results :

| ID | NAME | LANGUAGE | DIALECT | CITY_ID | DUPS |
|----|--------|----------|---------|---------|------|
| 3 | London | B | B1 | 2 | 2 |
| 4 | London | B | B2 | 3 | 2 |

Awesome! I ended up changing this slightly to create a temporary table suspicious_city_names for the inner query, then join to it. I think this makes the query a bit easier to follow. Thanks for your help!

Stack Exchange Network

List distinct column values where those rows share other column values

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

List distinct column values where those rows share other column values

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions