Insert Values Into Table, Putting Duplicates In Another Table

Question 1

So I posted this question yesterday. Some of the responses I got were helpful, however it seems my issue is a bit more complex than I originally thought.

After doing some looking the reason I was getting errors with my INSERT statement was because I was having columns like this:

part_number | description | information 
------------------------------------------------
331335A11 Desc1 Info1
331335A11 Desc2 Info1

Essentially, there are a number of entries that have the same value for the part_number field (which is suppose to be a UNIQUE column) but different vales for their other columns. As such the query was trying to insert them into the database, and I have my problem.

So what I am trying to do, because I am unsure just how many records in my table have this problem, is to do the INSERT into my parts table, but every time I get a repeated part_number value, instead of inserting it into the parts table, it is instead inserted into a table called parts_duplicates which won't have the unique restriction for the part_number column (but still have all the same columns as the parts table. From here I can analyze my incorrect data points and fix them (hopefully).

My only problem is...I have no idea where to even get started on tackling this. In the question I posted above one of the responses suggested using MERGE and I am currently in the process of testing that, but I am wondering if there is a better way to go about this.

Question 2

Why don't you just do the insert into the parts_duplicates table first, then you can have a background process or a manual process that goes through there, uses your logic to determine which of the duplicates is the "good" one, and then inserts that one row into the table with the constraint? You could also consider an INSTEAD OF trigger to accomplish the same, maybe only when duplicates are detected. (Also, personally, I wouldn't use MERGE unless you have a really good reason.)

Question 3

@AaronBertrand Because it is not immediately apparent which of the duplicate entries in some instances are the "good ones". The first two I found yesterday were essentially two different parts with the only similarity being the part number. Maybe one was an old part, I am not sure but I don't have any easy way to decide which entry is good or not :\

Question 4

So how do you expect SQL Server to determine which entry is good or not? Anyway you can use an INSTEAD OF insert trigger to insert the "first" arbitrary row into the main table, and then the remainder into the dupes table, or just insert all of them into the dupes table, and not correct anything until you have determined which entry is the "good" one.

Question 5

@AaronBertrand It's more I want to find all the duplicate entries, and put them into the parts_duplicates table so I can go through all of them to find out which are good, fix the data, and then import them later. I want to be able to insert everything that doesn't have a duplicate into the parts table without issue. Would I need to use like WHERE Count(part_number) > 1 or something for this essentially?

Question 6

I think this was/is all being covered on the original question but not sure what the etiquette is here regarding merging/updating of questions.

Question 7

Here is a possible solution that seems to work and doesn't require triggers - you'd have to test it against your real data.

--Demo setup
Declare @Parts table (part_number varchar(30), description varchar(30), information varchar(30))
Declare @PartsTemp table (part_number varchar(30), description varchar(30), information varchar(30))
Declare @PartsDuplicates table (part_number varchar(30), description varchar(30), information varchar(30))
insert into @Parts(part_number,description,information) values
('331335A10', 'Desc1', 'Info1') --Row already exists on the @Parts table
insert into @PartsTemp(part_number,description,information) values
('331335A00', 'Desc1', 'Info1'), --No row on the @Parts table and no duplicate
('331335A10', 'Desc1', 'Info1'), --Row already exists on the @Parts table
('331335A11', 'Desc1', 'Info1'), --No row on the @Parts table
('331335A11', 'Desc2', 'Info1') --Duplicate row on the @PartsTemp table
--The solution
--Common table expression to add row number to each PartsTemp row
;WITH PartsTempAndRowNumber
AS (
 SELECT *
 ,ROW_NUMBER() OVER (
 PARTITION BY part_number ORDER BY description
 ) AS rn
 FROM @PartsTemp
 )
--Insert into @PartsDuplicates where either:
--The rn<>1 - meaning duplicates on the @PartsTemp table
--OR
--The part number already exists on the @Parts table
INSERT INTO @PartsDuplicates (
 part_number
 ,description
 ,information
 )
SELECT part_number
 ,description
 ,information
FROM PartsTempAndRowNumber ptarn
WHERE rn <> 1
UNION ALL
SELECT ptarn.part_number
 ,ptarn.description
 ,ptarn.information
FROM PartsTempAndRowNumber ptarn
JOIN @Parts pt
 ON pt.part_number = ptarn.part_number
 AND ptarn.rn = 1
--Insert rows to @Parts selecting from @PartsTemp where the part_number can't be found
--on the @PartsDuplicates table
INSERT INTO @Parts (
 part_number
 ,description
 ,information
 )
SELECT part_number
 ,description
 ,information
FROM @PartsTemp pt
WHERE NOT EXISTS (
 SELECT *
 FROM @PartsDuplicates
 WHERE part_number = pt.part_number
 )
--Verify @Parts rows
SELECT *
FROM @Parts
ORDER BY part_number
--Verify @PartsDuplicates rows
SELECT *
FROM @PartsDuplicates
ORDER BY part_number

After execution @Parts

| part_number | description | information |
|-------------|-------------|-------------|
| 331335A00 | Desc1 | Info1 |
| 331335A10 | Desc1 | Info1 |

After execution @PartsDuplicates

| part_number | description | information |
|-------------|-------------|-------------|
| 331335A10 | Desc1 | Info1 |
| 331335A11 | Desc2 | Info1 |

Question 8

The reason I suggested a trigger is because I always assume that you can't control all of the ways that data can get into the table (assuming otherwise can be dangerous). Your inserts might be ad hoc, distributed in apps, auto-generated by ORMs, etc.

Given these tables:

CREATE TABLE dbo.parts(PartID int PRIMARY KEY, descr sysname /*, other cols */);
CREATE TABLE dbo.parts_duplicates(PartID int, descr sysname /*, other cols */);
CREATE CLUSTERED INDEX x ON dbo.parts_duplicates(PartID);
GO

We can build this trigger:

CREATE TRIGGER dbo.ShelvePartsDupes ON dbo.parts INSTEAD OF INSERT
AS
BEGIN
 SET NOCOUNT ON;
 -- first, stuff rows that already exist in parts
 -- or that are duplicates from this batch only into dupes
 INSERT dbo.parts_duplicates(PartID, descr /*, other cols */)
 SELECT PartID, descr /*, other cols */
 FROM
 (
 SELECT PartID, c = COUNT(*) OVER (PARTITION BY PartID), descr 
 /*, other cols */
 FROM inserted
 ) AS x
 WHERE c > 1
 OR EXISTS (SELECT 1 FROM dbo.parts WHERE PartID = x.PartID);
 -- rows that are both singular and don't already exist:
 INSERT dbo.parts(PartID, descr /*, other cols */)
 SELECT PartID, descr /*, other cols */
 FROM
 (
 -- aggregating here is ok because it'll only ever be one row
 SELECT PartID, descr = MAX(descr) /*, other cols = MAX(other cols) */
 FROM inserted AS i
 WHERE NOT EXISTS
 (
 SELECT 1 FROM dbo.parts WHERE PartID = i.PartID
 )
 GROUP BY PartID
 HAVING COUNT(*) = 1
 ) AS x;
END

So three sample inserts, one to create an initial row, the second to simulate (a) new single row that already exists (b) new single row that doesn't already exist (c) new pair of rows that don't already exist, and the third to simulate a new pair of rows that already have a partID in the target.

INSERT dbo.Parts(PartID, descr) 
 VALUES(1, N'floob');
GO
INSERT dbo.Parts(PartID, descr)
 VALUES(1, N'bar'), (2, N'New'), (3, N'New dupe 1'), (3, N'New dupe 2');
GO
INSERT dbo.Parts(PartID, descr)
 VALUES(2, N'New dupe 3'), (2, N'New dupe 4');

Let's check what we have:

SELECT * FROM dbo.parts;
SELECT * FROM dbo.parts_duplicates;

Results:

enter image description here

If you want to build in some kind of logic that would have picked an arbitrary duplicate from the PartID = 3 rows, you can, but your comments seemed to indicate you want to manually determine which row to keep.

Question 9

You can use queries like below to filter out Duplicate / unique rows from parts_temp table and build onto to that logic to insert rows in main parts table or duplicate_parts table

-- TO GET non-duplicate entries in parts_temp based on part number
SELECT * FROM parts_temp
WHERE partnumber IN (SELECT part_number FROM parts_temp GROUP BY part_numnber HAVING COUNT(1) = 1)
-- TO GET duplicate entries in parts_temp based on part number
SELECT * FROM parts_temp
WHERE partnumber IN (SELECT part_number FROM parts_temp GROUP BY part_numnber HAVING COUNT(1) > 1)

score 2 · Accepted Answer · 2018-08-17 18:10:19Z

Here is a possible solution that seems to work and doesn't require triggers - you'd have to test it against your real data.

--Demo setup
Declare @Parts table (part_number varchar(30), description varchar(30), information varchar(30))
Declare @PartsTemp table (part_number varchar(30), description varchar(30), information varchar(30))
Declare @PartsDuplicates table (part_number varchar(30), description varchar(30), information varchar(30))
insert into @Parts(part_number,description,information) values
('331335A10', 'Desc1', 'Info1') --Row already exists on the @Parts table
insert into @PartsTemp(part_number,description,information) values
('331335A00', 'Desc1', 'Info1'), --No row on the @Parts table and no duplicate
('331335A10', 'Desc1', 'Info1'), --Row already exists on the @Parts table
('331335A11', 'Desc1', 'Info1'), --No row on the @Parts table
('331335A11', 'Desc2', 'Info1') --Duplicate row on the @PartsTemp table
--The solution
--Common table expression to add row number to each PartsTemp row
;WITH PartsTempAndRowNumber
AS (
 SELECT *
 ,ROW_NUMBER() OVER (
 PARTITION BY part_number ORDER BY description
 ) AS rn
 FROM @PartsTemp
 )
--Insert into @PartsDuplicates where either:
--The rn<>1 - meaning duplicates on the @PartsTemp table
--OR
--The part number already exists on the @Parts table
INSERT INTO @PartsDuplicates (
 part_number
 ,description
 ,information
 )
SELECT part_number
 ,description
 ,information
FROM PartsTempAndRowNumber ptarn
WHERE rn <> 1
UNION ALL
SELECT ptarn.part_number
 ,ptarn.description
 ,ptarn.information
FROM PartsTempAndRowNumber ptarn
JOIN @Parts pt
 ON pt.part_number = ptarn.part_number
 AND ptarn.rn = 1
--Insert rows to @Parts selecting from @PartsTemp where the part_number can't be found
--on the @PartsDuplicates table
INSERT INTO @Parts (
 part_number
 ,description
 ,information
 )
SELECT part_number
 ,description
 ,information
FROM @PartsTemp pt
WHERE NOT EXISTS (
 SELECT *
 FROM @PartsDuplicates
 WHERE part_number = pt.part_number
 )
--Verify @Parts rows
SELECT *
FROM @Parts
ORDER BY part_number
--Verify @PartsDuplicates rows
SELECT *
FROM @PartsDuplicates
ORDER BY part_number

After execution @Parts

| part_number | description | information |
|-------------|-------------|-------------|
| 331335A00 | Desc1 | Info1 |
| 331335A10 | Desc1 | Info1 |

After execution @PartsDuplicates

| part_number | description | information |
|-------------|-------------|-------------|
| 331335A10 | Desc1 | Info1 |
| 331335A11 | Desc2 | Info1 |

Stack Exchange Network

Insert Values Into Table, Putting Duplicates In Another Table

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Insert Values Into Table, Putting Duplicates In Another Table

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions