I have a table that gets results from three different sources. Each column represents a source, and each row a result of an outcome. There are over 50k rows for a total of 150k results.
I need to run a report that within these results, I want to remove duplicates leaving the unique values behind, in their respective columns. The majority of the results will all be duplicates, and I would assume around ~500 are unique.
The other 'remove duplicate from multiple columns' posts haven't worked for me; any combo of distinct, groups, and unions I have not been able to get to work.
Example of data below. Thanks.
Raw Data: Data'r
Expected Results: Results
Squiggles: Squiggles
2 Answers 2
I broke this down using pivot
and not exists
. I really would handle this in the presentation layer though.
--load test data
declare @table table (c1 int, c2 int, c3 int)
insert into @table
values
(1,1,1)
,(1,1,1)
,(2,3,2)
,(4,2,4)
,(5,4,6)
,(7,5,8)
,(9,7,11)
,(11,9,13)
,(14,16,15)
--get our unique values in a cte to pivot later
;with cte as(
select
--here we add a RN so that we can use pivot without losing values
r = row_number() over (partition by Col order by (select 1))
,i.*
from
(
--for each column, we get the unique values where they don't exist in the other two columns
--we union them together, but give them 1 /2 / 3 column identifier
select
1 as Col, c1.c1 as val
from
(select distinct t1.c1 from @table t1
where not exists (select 1 from @table t2 where t2.c2 = t1.c1)
and not exists (select 1 from @table t3 where t3.c3 = t1.c1)) c1
union
select
2 as col, c2.c2
from
(select distinct t1.c2 from @table t1
where not exists (select 1 from @table t2 where t2.c1 = t1.c2)
and not exists (select 1 from @table t3 where t3.c3 = t1.c2)) c2
union
select
3 as col, c3.c3
from
(select distinct t1.c3 from @table t1
where not exists (select 1 from @table t2 where t2.c1 = t1.c3)
and not exists (select 1 from @table t3 where t3.c2 = t1.c3)) c3
) i
)
--simple pivot
select
[1], [2], [3]
from cte
pivot
(max(Val) for Col in ([1],[2],[3]))
p
RETURNS
+------+------+----+
| 1 | 2 | 3 |
+------+------+----+
| 14 | 3 | 6 |
| NULL | 16 | 8 |
| NULL | NULL | 13 |
| NULL | NULL | 15 |
+------+------+----+
-
I use SQL a few times a year at best, so my common logic doesn't always click. I updated the original question with a picture. The dbo i'm using is NUMs, and the actual columns are 1,2,3. Is that causing an issue?CGCIC– CGCIC2018年11月08日 19:17:57 +00:00Commented Nov 8, 2018 at 19:17
-
your column names aren't c1, c2, c3 in your database... change it to what ever the column names are. This is as much info as I can give you for this answer. Good luckS3S– S3S2018年11月08日 19:40:59 +00:00Commented Nov 8, 2018 at 19:40
-
Even when accounting for that, and all other errors cleared, I still am receiving "Msg 207, Level 16, State 1, Line 40 Invalid column name 'Val'." for the second to last line.CGCIC– CGCIC2018年11月09日 14:30:12 +00:00Commented Nov 9, 2018 at 14:30
What is the expected outcome? Maybe use a sample table to demonstrate?
column1 | column 2 | column 3
-----------------------------
value1 | value3 | value2
value2 | value1 | value3
value3 | value2 | value1
Many assumptions
- Can values be duplicated across columns? I saw some.
- What to do with duplicates? Empty the column? Empty the row?
- I assume you basically want
UNIQUE
values for each column for the final result?
Try:
SELECT DISTINCT column1 FROM tableA
UNION
SELECT DISTINCT column2 FROM tableA
UNION
SELECT DISTINCT column3 FROM tableA
-
DISTINCT is redundant, as UNION already removes duplicated rowsPere Joan Martorell– Pere Joan Martorell2019年05月08日 09:46:35 +00:00Commented May 8, 2019 at 9:46