I have a function that deletes duplicate entries. The highest ID is kept and the older ones are removed.
function:
DELETE [tableName]
FROM [tableName]
INNER JOIN (SELECT * , ROW_NUMBER() OVER
(PARTITION BY [fork_id] ORDER BY ID DESC)
AS RowNumber FROM [tableName])
Numbered ON [tableName].ID = Numbered.ID
WHERE RowNumber > 1
For example, it changes
|------|---------|--------|
| ID | fork_id | Car |
|------|---------|--------|
| 1 | 2 | AUDI | <--- removed
| 2 | 1 | AUDI |
| 3 | 2 | BMW |
|------|---------|--------|
to
|------|---------|--------|
| ID | fork_id | Car |
|------|---------|--------|
| 2 | 1 | AUDI |
| 3 | 2 | BMW |
|------|---------|--------|
The problem with that query is the execution time exceed time when with have many rows (more than 50k) in the table.
I have a primary key
for the ID
column
I'm In the sql server, I have a limitation about execution time.
A connection can be cut off by the server for a number of reasons:
- Idle connection longer than 5 minutes.
- Long running query.
- Long running open transaction.
- Excessive resource usage.
1 Answer 1
This would probably be better on DBA
You can use a CTE
An index on fork_id should help
with cte as
( SELECT ROW_NUMBER() OVER (PARTITION BY [fork_id] ORDER BY ID DESC) AS RowNumber
FROM [tableName]
)
delete from cte
WHERE RowNumber > 1
Optimize
select * from cte
WHERE RowNumber > 1
If that is fast it is volume thing and you could delete in batches
-
\$\begingroup\$ sorry for the delay.. I have a bug with the "*" in the delete part.
Failed to execute query. Error: Incorrect syntax near '*'
\$\endgroup\$Jean-philippe Emond– Jean-philippe Emond2017年06月15日 13:47:39 +00:00Commented Jun 15, 2017 at 13:47 -
\$\begingroup\$ try without the * \$\endgroup\$paparazzo– paparazzo2017年06月15日 13:58:34 +00:00Commented Jun 15, 2017 at 13:58
-
\$\begingroup\$ works now :-) but the time are 103s for
Query succeeded: Affected rows: 487665.
with an index onfork_id
\$\endgroup\$Jean-philippe Emond– Jean-philippe Emond2017年06月15日 14:02:31 +00:00Commented Jun 15, 2017 at 14:02 -
\$\begingroup\$ @Jean-philippeEmond It takes time to delete rows. \$\endgroup\$paparazzo– paparazzo2017年06月15日 14:06:59 +00:00Commented Jun 15, 2017 at 14:06
-
\$\begingroup\$ could I use like
select * into AnotherTable from cte WHERE RowNumber > 1
to move all data into another table instead of delete? it could be more quick? (not working but..) \$\endgroup\$Jean-philippe Emond– Jean-philippe Emond2017年06月15日 18:20:00 +00:00Commented Jun 15, 2017 at 18:20
Explore related questions
See similar questions with these tags.
primary key
. Primary keys are always indexed by default. should I do a Index with thefork_id
column? \$\endgroup\$