1
\$\begingroup\$

I have a function that deletes duplicate entries. The highest ID is kept and the older ones are removed.

function:

DELETE [tableName]
FROM [tableName]
INNER JOIN (SELECT * , ROW_NUMBER() OVER 
 (PARTITION BY [fork_id] ORDER BY ID DESC) 
 AS RowNumber FROM [tableName])
Numbered ON [tableName].ID = Numbered.ID
WHERE RowNumber > 1

For example, it changes

|------|---------|--------|
| ID | fork_id | Car |
|------|---------|--------|
| 1 | 2 | AUDI | <--- removed
| 2 | 1 | AUDI |
| 3 | 2 | BMW |
|------|---------|--------|

to

|------|---------|--------|
| ID | fork_id | Car |
|------|---------|--------|
| 2 | 1 | AUDI |
| 3 | 2 | BMW |
|------|---------|--------|

The problem with that query is the execution time exceed time when with have many rows (more than 50k) in the table.

I have a primary key for the ID column

I'm In the sql server, I have a limitation about execution time.

A connection can be cut off by the server for a number of reasons:

  1. Idle connection longer than 5 minutes.
  2. Long running query.
  3. Long running open transaction.
  4. Excessive resource usage.

sources

200_success
146k22 gold badges190 silver badges479 bronze badges
asked Jun 14, 2017 at 14:47
\$\endgroup\$
4
  • \$\begingroup\$ what are your indexes on? \$\endgroup\$ Commented Jun 14, 2017 at 15:04
  • \$\begingroup\$ I don't have any index except the primary key. Primary keys are always indexed by default. should I do a Index with the fork_idcolumn? \$\endgroup\$ Commented Jun 14, 2017 at 15:12
  • \$\begingroup\$ well, if ID, fork_id is unique (paired) a clustered index could help. I'm curious as to how long it takes on 100K records. \$\endgroup\$ Commented Jun 14, 2017 at 15:31
  • \$\begingroup\$ The fork_id is not necessary assending or with a particular order. well, it probably a nonclustered? \$\endgroup\$ Commented Jun 14, 2017 at 15:38

1 Answer 1

3
\$\begingroup\$

This would probably be better on DBA
You can use a CTE
An index on fork_id should help

with cte as 
( SELECT ROW_NUMBER() OVER (PARTITION BY [fork_id] ORDER BY ID DESC) AS RowNumber 
 FROM [tableName]
)
delete from cte 
WHERE RowNumber > 1 

Optimize

select * from cte 
WHERE RowNumber > 1

If that is fast it is volume thing and you could delete in batches

answered Jun 14, 2017 at 16:07
\$\endgroup\$
7
  • \$\begingroup\$ sorry for the delay.. I have a bug with the "*" in the delete part. Failed to execute query. Error: Incorrect syntax near '*' \$\endgroup\$ Commented Jun 15, 2017 at 13:47
  • \$\begingroup\$ try without the * \$\endgroup\$ Commented Jun 15, 2017 at 13:58
  • \$\begingroup\$ works now :-) but the time are 103s for Query succeeded: Affected rows: 487665. with an index on fork_id \$\endgroup\$ Commented Jun 15, 2017 at 14:02
  • \$\begingroup\$ @Jean-philippeEmond It takes time to delete rows. \$\endgroup\$ Commented Jun 15, 2017 at 14:06
  • \$\begingroup\$ could I use like select * into AnotherTable from cte WHERE RowNumber > 1 to move all data into another table instead of delete? it could be more quick? (not working but..) \$\endgroup\$ Commented Jun 15, 2017 at 18:20

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.