I have a table with this schema
ID, int primary key
PathKey, string not null
InsertDate, datetime not null
Value, int not null
- In the table there are a medium-large number of records. 1632 different pathkey and 645627 total records
- The PathKey is not unique (because I store also the old values).
- You can identify a record with the ID or a PathKey with the InsertDate.
I am developing a query to extract the last record for each path key, and this is the query I am using but is ugly and incredible slow.
SELECT *
FROM ArchiveData
WHERE ID IN (
SELECT (
SELECT TOP 1 ID
FROM ArchiveData
WHERE PathKey = AD.PathKey
ORDER BY [InsertDate] DESC
) AS ArchiveDataID
FROM ArchiveData AS AD
GROUP BY PathKey
)
ORDER BY PathKey
Any suggestions to improve, at least, the performance?
-
\$\begingroup\$ Are your ID values 'strictly increasing' (i.e. does the most recent date record also have the largest ID)? \$\endgroup\$rolfl– rolfl2014年10月03日 13:51:26 +00:00Commented Oct 3, 2014 at 13:51
-
\$\begingroup\$ Yes, the identity is set as auto incremental \$\endgroup\$simoneL– simoneL2014年10月03日 13:52:44 +00:00Commented Oct 3, 2014 at 13:52
2 Answers 2
Using a CTE (Supported by SQL Server) would help separate the logic of the query better than the sub-select in the from clause. Using the auto-increment ID is also something that can simplify the query.
Consider the following:
with MostRecent as (
select max(ID) as ID
from ArchiveData
group by PathKey
)
select *
from ArchiveData inner join MostRecent on MostRecent.ID = ArchiveData.ID
order by PathKey
The above should reduce the number of joins a lot, and use a better key for the joins that are done.
-
\$\begingroup\$ SQL Server Execution Times: CPU time = 1328 ms, elapsed time = 880 ms. Enough for what I need. Thank you! \$\endgroup\$simoneL– simoneL2014年10月03日 14:16:58 +00:00Commented Oct 3, 2014 at 14:16
Allow me to suggest using the ROW_NUMBER function, which allows you to number returned rows using the ordering you provide.
Here is a complete query that uses the ROW_NUMBER function. I also removed the SELECT *
and replaced them with the column names.
SELECT
ID,
PathKey,
InsertDate,
Value
FROM
(
SELECT
ID,
PathKey,
InsertDate,
Value,
ROW_NUMBER() OVER (PARTITION BY PathKey ORDER BY ID DESC) AS Row
FROM ArchiveData
) A
WHERE A.Row = 1
You could also use a CTE or temp table to store the results of the inner query. I would recommend using a temp table for the reasons outlined in this DBA.SE question
Here is an example using a temp table:
CREATE TABLE #ArchivedData
(
ID INT PRIMARY KEY,
PathKey VARCHAR(50) NOT NULL,
InsertDate DATETIME NOT NULL,
Value INT NOT NULL,
Row INT NOT NULL
)
INSERT INTO #ArchivedData
SELECT
ID,
PathKey,
InsertDate,
Value,
ROW_NUMBER() OVER (PARTITION BY PathKey ORDER BY ID DESC) AS Row
FROM ArchiveData
SELECT
ID,
PathKey,
InsertDate,
Value
FROM #ArchivedData
WHERE Row = 1
DROP TABLE #ArchivedData
-
\$\begingroup\$ Hi, thanks for your suggestion. I just tried it but it's slower than rolfl answer. \$\endgroup\$simoneL– simoneL2014年10月06日 13:08:18 +00:00Commented Oct 6, 2014 at 13:08