Select from multiple rows without duplicate values, with all random data

Question 1

I have a table that gets results from three different sources. Each column represents a source, and each row a result of an outcome. There are over 50k rows for a total of 150k results.

I need to run a report that within these results, I want to remove duplicates leaving the unique values behind, in their respective columns. The majority of the results will all be duplicates, and I would assume around ~500 are unique.

The other 'remove duplicate from multiple columns' posts haven't worked for me; any combo of distinct, groups, and unions I have not been able to get to work.

Example of data below. Thanks.

Raw Data: Data'r

Expected Results: Results

Squiggles: Squiggles

Question 2

I broke this down using pivot and not exists. I really would handle this in the presentation layer though.

--load test data
declare @table table (c1 int, c2 int, c3 int)
insert into @table
values
(1,1,1)
,(1,1,1)
,(2,3,2)
,(4,2,4)
,(5,4,6)
,(7,5,8)
,(9,7,11)
,(11,9,13)
,(14,16,15)
--get our unique values in a cte to pivot later
;with cte as(
select 
 --here we add a RN so that we can use pivot without losing values
 r = row_number() over (partition by Col order by (select 1))
 ,i.*
from
 (
 --for each column, we get the unique values where they don't exist in the other two columns
 --we union them together, but give them 1 /2 / 3 column identifier
 select
 1 as Col, c1.c1 as val
 from
 (select distinct t1.c1 from @table t1
 where not exists (select 1 from @table t2 where t2.c2 = t1.c1)
 and not exists (select 1 from @table t3 where t3.c3 = t1.c1)) c1
 union
 select 
 2 as col, c2.c2
 from
 (select distinct t1.c2 from @table t1
 where not exists (select 1 from @table t2 where t2.c1 = t1.c2)
 and not exists (select 1 from @table t3 where t3.c3 = t1.c2)) c2 
 union
 select
 3 as col, c3.c3
 from
 (select distinct t1.c3 from @table t1
 where not exists (select 1 from @table t2 where t2.c1 = t1.c3)
 and not exists (select 1 from @table t3 where t3.c2 = t1.c3)) c3
 ) i
)
--simple pivot
select
 [1], [2], [3]
from cte 
pivot
(max(Val) for Col in ([1],[2],[3]))
p

RETURNS

+------+------+----+
| 1 | 2 | 3 |
+------+------+----+
| 14 | 3 | 6 |
| NULL | 16 | 8 |
| NULL | NULL | 13 |
| NULL | NULL | 15 |
+------+------+----+

Question 3

I use SQL a few times a year at best, so my common logic doesn't always click. I updated the original question with a picture. The dbo i'm using is NUMs, and the actual columns are 1,2,3. Is that causing an issue?

Question 4

your column names aren't c1, c2, c3 in your database... change it to what ever the column names are. This is as much info as I can give you for this answer. Good luck

Question 5

Even when accounting for that, and all other errors cleared, I still am receiving "Msg 207, Level 16, State 1, Line 40 Invalid column name 'Val'." for the second to last line.

Question 6

What is the expected outcome? Maybe use a sample table to demonstrate?

column1 | column 2 | column 3
-----------------------------
 value1 | value3 | value2
 value2 | value1 | value3
 value3 | value2 | value1

Many assumptions

Can values be duplicated across columns? I saw some.
What to do with duplicates? Empty the column? Empty the row?
I assume you basically want UNIQUE values for each column for the final result?

Try:

SELECT DISTINCT column1 FROM tableA 
UNION 
SELECT DISTINCT column2 FROM tableA 
UNION
SELECT DISTINCT column3 FROM tableA

Question 7

DISTINCT is redundant, as UNION already removes duplicated rows

S3S S3S 3,5881 gold badge14 silver badges26 bronze badges · Answer 1 · 2018-11-02 19:54:08Z

I broke this down using pivot and not exists. I really would handle this in the presentation layer though.

--load test data
declare @table table (c1 int, c2 int, c3 int)
insert into @table
values
(1,1,1)
,(1,1,1)
,(2,3,2)
,(4,2,4)
,(5,4,6)
,(7,5,8)
,(9,7,11)
,(11,9,13)
,(14,16,15)
--get our unique values in a cte to pivot later
;with cte as(
select 
 --here we add a RN so that we can use pivot without losing values
 r = row_number() over (partition by Col order by (select 1))
 ,i.*
from
 (
 --for each column, we get the unique values where they don't exist in the other two columns
 --we union them together, but give them 1 /2 / 3 column identifier
 select
 1 as Col, c1.c1 as val
 from
 (select distinct t1.c1 from @table t1
 where not exists (select 1 from @table t2 where t2.c2 = t1.c1)
 and not exists (select 1 from @table t3 where t3.c3 = t1.c1)) c1
 union
 select 
 2 as col, c2.c2
 from
 (select distinct t1.c2 from @table t1
 where not exists (select 1 from @table t2 where t2.c1 = t1.c2)
 and not exists (select 1 from @table t3 where t3.c3 = t1.c2)) c2 
 union
 select
 3 as col, c3.c3
 from
 (select distinct t1.c3 from @table t1
 where not exists (select 1 from @table t2 where t2.c1 = t1.c3)
 and not exists (select 1 from @table t3 where t3.c2 = t1.c3)) c3
 ) i
)
--simple pivot
select
 [1], [2], [3]
from cte 
pivot
(max(Val) for Col in ([1],[2],[3]))
p

RETURNS

+------+------+----+
| 1 | 2 | 3 |
+------+------+----+
| 14 | 3 | 6 |
| NULL | 16 | 8 |
| NULL | NULL | 13 |
| NULL | NULL | 15 |
+------+------+----+

I use SQL a few times a year at best, so my common logic doesn't always click. I updated the original question with a picture. The dbo i'm using is NUMs, and the actual columns are 1,2,3. Is that causing an issue?
your column names aren't c1, c2, c3 in your database... change it to what ever the column names are. This is as much info as I can give you for this answer. Good luck
Even when accounting for that, and all other errors cleared, I still am receiving "Msg 207, Level 16, State 1, Line 40 Invalid column name 'Val'." for the second to last line.

Jerry Hung Jerry Hung 1308 bronze badges · Answer 2 · 2018-11-02 05:52:51Z

What is the expected outcome? Maybe use a sample table to demonstrate?

column1 | column 2 | column 3
-----------------------------
 value1 | value3 | value2
 value2 | value1 | value3
 value3 | value2 | value1

Many assumptions

Can values be duplicated across columns? I saw some.
What to do with duplicates? Empty the column? Empty the row?
I assume you basically want UNIQUE values for each column for the final result?

Try:

SELECT DISTINCT column1 FROM tableA 
UNION 
SELECT DISTINCT column2 FROM tableA 
UNION
SELECT DISTINCT column3 FROM tableA

DISTINCT is redundant, as UNION already removes duplicated rows

Stack Exchange Network

Select from multiple rows without duplicate values, with all random data

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Select from multiple rows without duplicate values, with all random data

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions