3
\$\begingroup\$

I have a table with this schema

ID, int primary key
PathKey, string not null
InsertDate, datetime not null
Value, int not null
  • In the table there are a medium-large number of records. 1632 different pathkey and 645627 total records
  • The PathKey is not unique (because I store also the old values).
  • You can identify a record with the ID or a PathKey with the InsertDate.

I am developing a query to extract the last record for each path key, and this is the query I am using but is ugly and incredible slow.

SELECT *
FROM ArchiveData
WHERE ID IN (
 SELECT (
 SELECT TOP 1 ID
 FROM ArchiveData
 WHERE PathKey = AD.PathKey
 ORDER BY [InsertDate] DESC
 ) AS ArchiveDataID
 FROM ArchiveData AS AD
 GROUP BY PathKey
 )
ORDER BY PathKey

Any suggestions to improve, at least, the performance?

asked Oct 3, 2014 at 13:43
\$\endgroup\$
2
  • \$\begingroup\$ Are your ID values 'strictly increasing' (i.e. does the most recent date record also have the largest ID)? \$\endgroup\$ Commented Oct 3, 2014 at 13:51
  • \$\begingroup\$ Yes, the identity is set as auto incremental \$\endgroup\$ Commented Oct 3, 2014 at 13:52

2 Answers 2

2
\$\begingroup\$

Using a CTE (Supported by SQL Server) would help separate the logic of the query better than the sub-select in the from clause. Using the auto-increment ID is also something that can simplify the query.

Consider the following:

with MostRecent as (
 select max(ID) as ID
 from ArchiveData
 group by PathKey
)
select *
from ArchiveData inner join MostRecent on MostRecent.ID = ArchiveData.ID
order by PathKey

The above should reduce the number of joins a lot, and use a better key for the joins that are done.

answered Oct 3, 2014 at 13:56
\$\endgroup\$
1
  • \$\begingroup\$ SQL Server Execution Times: CPU time = 1328 ms, elapsed time = 880 ms. Enough for what I need. Thank you! \$\endgroup\$ Commented Oct 3, 2014 at 14:16
1
\$\begingroup\$

Allow me to suggest using the ROW_NUMBER function, which allows you to number returned rows using the ordering you provide.

Here is a complete query that uses the ROW_NUMBER function. I also removed the SELECT * and replaced them with the column names.

SELECT 
 ID,
 PathKey, 
 InsertDate, 
 Value
FROM 
(
 SELECT 
 ID,
 PathKey, 
 InsertDate, 
 Value,
 ROW_NUMBER() OVER (PARTITION BY PathKey ORDER BY ID DESC) AS Row
 FROM ArchiveData
) A
WHERE A.Row = 1

You could also use a CTE or temp table to store the results of the inner query. I would recommend using a temp table for the reasons outlined in this DBA.SE question

Here is an example using a temp table:

CREATE TABLE #ArchivedData
(
 ID INT PRIMARY KEY,
 PathKey VARCHAR(50) NOT NULL,
 InsertDate DATETIME NOT NULL,
 Value INT NOT NULL,
 Row INT NOT NULL
)
INSERT INTO #ArchivedData
SELECT 
 ID,
 PathKey, 
 InsertDate, 
 Value,
 ROW_NUMBER() OVER (PARTITION BY PathKey ORDER BY ID DESC) AS Row
FROM ArchiveData
SELECT 
 ID,
 PathKey, 
 InsertDate, 
 Value
FROM #ArchivedData
WHERE Row = 1
DROP TABLE #ArchivedData
answered Oct 6, 2014 at 12:59
\$\endgroup\$
1
  • \$\begingroup\$ Hi, thanks for your suggestion. I just tried it but it's slower than rolfl answer. \$\endgroup\$ Commented Oct 6, 2014 at 13:08

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.