Delete duplicate entries with lower IDs

Question 1

I have a function that deletes duplicate entries. The highest ID is kept and the older ones are removed.

function:

DELETE [tableName]
FROM [tableName]
INNER JOIN (SELECT * , ROW_NUMBER() OVER 
 (PARTITION BY [fork_id] ORDER BY ID DESC) 
 AS RowNumber FROM [tableName])
Numbered ON [tableName].ID = Numbered.ID
WHERE RowNumber > 1

For example, it changes

|------|---------|--------|
| ID | fork_id | Car |
|------|---------|--------|
| 1 | 2 | AUDI | <--- removed
| 2 | 1 | AUDI |
| 3 | 2 | BMW |
|------|---------|--------|

to

|------|---------|--------|
| ID | fork_id | Car |
|------|---------|--------|
| 2 | 1 | AUDI |
| 3 | 2 | BMW |
|------|---------|--------|

The problem with that query is the execution time exceed time when with have many rows (more than 50k) in the table.

I have a primary key for the ID column

I'm In the sql server, I have a limitation about execution time.

A connection can be cut off by the server for a number of reasons:

Idle connection longer than 5 minutes.

Long running query.

Long running open transaction.

Excessive resource usage.

sources

Question 2

what are your indexes on?

Question 3

I don't have any index except the primary key. Primary keys are always indexed by default. should I do a Index with the fork_idcolumn?

Question 4

well, if ID, fork_id is unique (paired) a clustered index could help. I'm curious as to how long it takes on 100K records.

Question 5

The fork_id is not necessary assending or with a particular order. well, it probably a nonclustered?

Question 6

This would probably be better on DBA
You can use a CTE
An index on fork_id should help

with cte as 
( SELECT ROW_NUMBER() OVER (PARTITION BY [fork_id] ORDER BY ID DESC) AS RowNumber 
 FROM [tableName]
)
delete from cte 
WHERE RowNumber > 1

Optimize

select * from cte 
WHERE RowNumber > 1

If that is fast it is volume thing and you could delete in batches

Question 7

sorry for the delay.. I have a bug with the "*" in the delete part. Failed to execute query. Error: Incorrect syntax near '*'

Question 8

try without the *

Question 9

works now :-) but the time are 103s for Query succeeded: Affected rows: 487665. with an index on fork_id

Question 10

@Jean-philippeEmond It takes time to delete rows.

Question 11

could I use like select * into AnotherTable from cte WHERE RowNumber > 1 to move all data into another table instead of delete? it could be more quick? (not working but..)

paparazzo paparazzo 6,1263 gold badges20 silver badges41 bronze badges · Accepted Answer · 2017-06-14 16:07:43Z

3

\$\begingroup\$

This would probably be better on DBA
You can use a CTE
An index on fork_id should help

with cte as 
( SELECT ROW_NUMBER() OVER (PARTITION BY [fork_id] ORDER BY ID DESC) AS RowNumber 
 FROM [tableName]
)
delete from cte 
WHERE RowNumber > 1

Optimize

select * from cte 
WHERE RowNumber > 1

If that is fast it is volume thing and you could delete in batches

Share

edited Jun 15, 2017 at 14:05

Jean-philippe Emond's user avatar

Jean-philippe Emond

1155 bronze badges

answered Jun 14, 2017 at 16:07

paparazzo's user avatar

paparazzo paparazzo

6,1263 gold badges20 silver badges41 bronze badges

\$\endgroup\$

7

\$\begingroup\$ sorry for the delay.. I have a bug with the "*" in the delete part. Failed to execute query. Error: Incorrect syntax near '*' \$\endgroup\$

Jean-philippe Emond
– Jean-philippe Emond

2017年06月15日 13:47:39 +00:00
Commented Jun 15, 2017 at 13:47
\$\begingroup\$ try without the * \$\endgroup\$

paparazzo
– paparazzo

2017年06月15日 13:58:34 +00:00
Commented Jun 15, 2017 at 13:58
\$\begingroup\$ works now :-) but the time are 103s for Query succeeded: Affected rows: 487665. with an index on fork_id \$\endgroup\$

Jean-philippe Emond
– Jean-philippe Emond

2017年06月15日 14:02:31 +00:00
Commented Jun 15, 2017 at 14:02
\$\begingroup\$ @Jean-philippeEmond It takes time to delete rows. \$\endgroup\$

paparazzo
– paparazzo

2017年06月15日 14:06:59 +00:00
Commented Jun 15, 2017 at 14:06
\$\begingroup\$ could I use like select * into AnotherTable from cte WHERE RowNumber > 1 to move all data into another table instead of delete? it could be more quick? (not working but..) \$\endgroup\$

Jean-philippe Emond
– Jean-philippe Emond

2017年06月15日 18:20:00 +00:00
Commented Jun 15, 2017 at 18:20

| Show 2 more comments

Stack Exchange Network

Delete duplicate entries with lower IDs

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Delete duplicate entries with lower IDs

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions