I am trying to optimize following query (this is most simplified version I could came up with):
SELECT tr.Id, StatusDate
FROM (
SELECT tr.Id, tr.StatusDate
FROM mon.ArchivedTaskResults_201504 as tr WITH (NOLOCK)
INNER JOIN mon.ViewDevicesWithGroups dev WITH (NOLOCK) ON tr.DeviceId = dev.Id
WHERE tr.ClientId = 4 AND dev.Deleted = 0
) AS tr
ORDER BY StatusDate DESC
OFFSET 1000000 rows
FETCH NEXT 25 ROWS ONLY
simple query execution plan
The problem is, query performance is directly proportional to OFFSET - for offset=0 query executes in 0.0s, but for offset=1000000 execution time is about 23s (and with even greater offset, it can take up to few minutes).
I'm almost sure that my problem could be solved with appropriate clustered index on ArchivedTaskResults table, but after trying for few hours I still haven't found good index yet.
ArchivedTaskResults tables are really big, having about 50000000 rows (50 M)
Additional information:
I would be extremally happy if someone could solve problem I described above, but to be honest, my REAL query is even more bizarre (disclaimer: I am not the one that designed this database):
SELECT tr.Id, StatusDate
FROM (
(
SELECT tr.Id, StatusDate
FROM mon.TaskResults as tr WITH (NOLOCK)
INNER JOIN mon.ViewDevicesWithGroups dev WITH (NOLOCK) ON tr.DeviceId = dev.Id
WHERE tr.ClientId = 4 AND dev.Deleted = 0
) UNION ALL (
SELECT tr.Id, tr.StatusDate
FROM mon.ArchivedTaskResults_201504 as tr WITH (NOLOCK)
INNER JOIN mon.ViewDevicesWithGroups dev WITH (NOLOCK) ON tr.DeviceId = dev.Id
WHERE tr.ClientId = 4 AND dev.Deleted = 0
) UNION ALL (
SELECT tr.Id, tr.StatusDate
FROM mon.ArchivedTaskResults_201505 as tr WITH (NOLOCK)
INNER JOIN mon.ViewDevicesWithGroups dev WITH (NOLOCK) ON tr.DeviceId = dev.Id
WHERE tr.ClientId = 4 AND dev.Deleted = 0
)
) AS tr
ORDER BY StatusDate DESC
OFFSET 1000000 ROWS
FETCH NEXT 25 ROWS
complex query execution plan
This is much more complex, because even queries with offset=0 have long execution time (I guess its because sql server have no idea that ArchivedTaskResults_201505 comes 'after' ArchivedTaskResults_201504, and tries to sort them in memory, but it's only my blind shot).
It would be beyond my wildest dreams if someone managed to help me with THAT query, but if it's impossible because of strange database design, I guess I could solve it with software (I could query tables one at a time from my application, instead of doing everything in SQL. Or using stored procedure for that) - but only if my first query could be improved.
Thanks in advance.
-
Wildly speculating, but you have a Key Lookup in your Query Plan, suggesting that the Index on one of your tables is not covering all the columns you need to pull. That is where your problem probably lies.Mark Sinkinson– Mark Sinkinson2015年05月11日 14:44:23 +00:00Commented May 11, 2015 at 14:44
-
Have you tried to create a materialized view with the union, create an index on it and query the view?Serpiton– Serpiton2015年05月11日 16:55:33 +00:00Commented May 11, 2015 at 16:55
-
Serpiton - no, but will probably be good idea, thank you. I haven't tried it yet, because I focused on first query (without the union).MSM– MSM2015年05月11日 18:55:26 +00:00Commented May 11, 2015 at 18:55
-
Mark - i will have a look at that, but I believe I tried most of 'obvious' missing indexes (and SSMS is silent about missing index), and I'm rather out of ideas. Thanks anyway.MSM– MSM2015年05月11日 18:56:31 +00:00Commented May 11, 2015 at 18:56
1 Answer 1
I don't think you'll be able to get good performance while using OFFSET
. The database must search through 1,000,025 rows of output from the inner query; even if you have a good clustered index on TaskResults
the system doesn't know for certain that it can skip ahead to date X.
But you do! Assuming this is for some kind of GUI, make a note of the earliest StatusDate
from the previous query, then use it to fitler next page:
SELECT
tr.Id, StatusDate
FROM
(
SELECT tr.Id, tr.StatusDate
FROM mon.ArchivedTaskResults_201504 as tr WITH (NOLOCK)
INNER JOIN mon.ViewDevicesWithGroups dev WITH (NOLOCK) ON tr.DeviceId = dev.Id
WHERE tr.ClientId = 4 AND dev.Deleted = 0
AND
(
-- Retrieve only records from before the previous page
tr.StatusDate < @PrevStatusDate
OR (tr.StatusDate = @PrevStatusDate AND tr.Id < @PrevID)
)
) AS tr
ORDER BY StatusDate, Id DESC
FETCH NEXT 25 ROWS ONLY
So if page #123 ends with 2015年05月01日, record #234, you want to consider all records that are from 2015年04月30日 or earlier, or which are also from 2015年05月01日 but are for records #1 .. #233.
This should work well with your more complex UNION query, but "real" partitioning would probably be easier than this roll-yer-own partitioning..
If StatusDate
is unique, or it's acceptable to occasionally show the same record on two adjacent pages, you can drop the @PrevID
and ORDER BY Id
bits. If Id
is always-increasing, you can filter off of it and skip StatusDate
.
Keep in mind that retrieving pages like this can easily skip a record or include the same record twice if records are being adding, removed, or reordered in the underlying data. But that's another topic.
-
Thank you, I will have a look at that. I'm not sure I will be able to do this (because going directly to n-th page (even with very large n) of results is part of GUI 'as is'), but this may be the only solution. I will accept your answer if noone else replies (and upvote as soon as I will be able to).MSM– MSM2015年05月11日 18:59:20 +00:00Commented May 11, 2015 at 18:59
Explore related questions
See similar questions with these tags.