Extract last record inserted for each key identifier

Question 1

I have a table with this schema

ID, int primary key
PathKey, string not null
InsertDate, datetime not null
Value, int not null

In the table there are a medium-large number of records. 1632 different pathkey and 645627 total records
The PathKey is not unique (because I store also the old values).
You can identify a record with the ID or a PathKey with the InsertDate.

I am developing a query to extract the last record for each path key, and this is the query I am using but is ugly and incredible slow.

SELECT *
FROM ArchiveData
WHERE ID IN (
 SELECT (
 SELECT TOP 1 ID
 FROM ArchiveData
 WHERE PathKey = AD.PathKey
 ORDER BY [InsertDate] DESC
 ) AS ArchiveDataID
 FROM ArchiveData AS AD
 GROUP BY PathKey
 )
ORDER BY PathKey

Any suggestions to improve, at least, the performance?

Question 2

Are your ID values 'strictly increasing' (i.e. does the most recent date record also have the largest ID)?

Question 3

Yes, the identity is set as auto incremental

Question 4

Using a CTE (Supported by SQL Server) would help separate the logic of the query better than the sub-select in the from clause. Using the auto-increment ID is also something that can simplify the query.

Consider the following:

with MostRecent as (
 select max(ID) as ID
 from ArchiveData
 group by PathKey
)
select *
from ArchiveData inner join MostRecent on MostRecent.ID = ArchiveData.ID
order by PathKey

The above should reduce the number of joins a lot, and use a better key for the joins that are done.

Question 5

SQL Server Execution Times: CPU time = 1328 ms, elapsed time = 880 ms. Enough for what I need. Thank you!

Question 6

Allow me to suggest using the ROW_NUMBER function, which allows you to number returned rows using the ordering you provide.

Here is a complete query that uses the ROW_NUMBER function. I also removed the SELECT * and replaced them with the column names.

SELECT 
 ID,
 PathKey, 
 InsertDate, 
 Value
FROM 
(
 SELECT 
 ID,
 PathKey, 
 InsertDate, 
 Value,
 ROW_NUMBER() OVER (PARTITION BY PathKey ORDER BY ID DESC) AS Row
 FROM ArchiveData
) A
WHERE A.Row = 1

You could also use a CTE or temp table to store the results of the inner query. I would recommend using a temp table for the reasons outlined in this DBA.SE question

Here is an example using a temp table:

CREATE TABLE #ArchivedData
(
 ID INT PRIMARY KEY,
 PathKey VARCHAR(50) NOT NULL,
 InsertDate DATETIME NOT NULL,
 Value INT NOT NULL,
 Row INT NOT NULL
)
INSERT INTO #ArchivedData
SELECT 
 ID,
 PathKey, 
 InsertDate, 
 Value,
 ROW_NUMBER() OVER (PARTITION BY PathKey ORDER BY ID DESC) AS Row
FROM ArchiveData
SELECT 
 ID,
 PathKey, 
 InsertDate, 
 Value
FROM #ArchivedData
WHERE Row = 1
DROP TABLE #ArchivedData

Question 7

Hi, thanks for your suggestion. I just tried it but it's slower than rolfl answer.

rolfl rolfl 98.1k17 gold badges219 silver badges419 bronze badges · Accepted Answer · 2014-10-03 13:56:56Z

Using a CTE (Supported by SQL Server) would help separate the logic of the query better than the sub-select in the from clause. Using the auto-increment ID is also something that can simplify the query.

Consider the following:

with MostRecent as (
 select max(ID) as ID
 from ArchiveData
 group by PathKey
)
select *
from ArchiveData inner join MostRecent on MostRecent.ID = ArchiveData.ID
order by PathKey

The above should reduce the number of joins a lot, and use a better key for the joins that are done.

SQL Server Execution Times: CPU time = 1328 ms, elapsed time = 880 ms. Enough for what I need. Thank you!

Stack Exchange Network

Extract last record inserted for each key identifier

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Extract last record inserted for each key identifier

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions