I have a table with over 10 million rows tracking user activities. I created a nonclustered index on (UserID, DtModified DESC), which performs well for queries filtering a single user. However, when querying multiple users, SQL Server first joins on the UserActivities table, then sorts the results by last modified before selecting the rows.
Since I’m using this query for pagination, if the combined users have 10,000 rows, SQL Server retrieves all of them, sorts them, and then selects only the first 50 rows. This approach becomes inefficient when searching for users with a large number of records.
Is there a way to improve performance with better indexing? Any advice would be greatly appreciated. Thanks!
P.S. Erik Darling previously suggested columnstore indexes in another post, but that isn’t an option for me right now.
Plan with single user: https://www.brentozar.com/pastetheplan/?id=uvYVLDaq9D
Plan with multiple users: https://www.brentozar.com/pastetheplan/?id=EmkR1GGa3p
-- Create a temporary table to demonstrate the issue
CREATE TABLE #UserActivities (
ActivityID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
UserID INT NOT NULL,
ActivityType VARCHAR(50) NOT NULL,
DtModified DATETIME2 NOT NULL,
Details NVARCHAR(MAX) NULL
);
-- Create the index we want to evaluate
CREATE NONCLUSTERED INDEX IX_UserID_DtModified
ON #UserActivities(UserID, DtModified DESC);
-- Insert sample data (10,000 rows for demonstration)
INSERT INTO #UserActivities (UserID, ActivityType, DtModified, Details)
SELECT
ABS(CHECKSUM(NEWID())) % 1000 + 1 AS UserID, -- 1,000 distinct users
CASE WHEN n % 10 = 0 THEN 'Login'
WHEN n % 5 = 0 THEN 'Purchase'
ELSE 'PageView' END AS ActivityType,
DATEADD(MINUTE, -ABS(CHECKSUM(NEWID())) % 525600, GETDATE()) AS DtModified,
'Sample activity details for user ' + CAST(ABS(CHECKSUM(NEWID())) % 1000 + 1 AS VARCHAR(10)) AS Details
FROM (
SELECT TOP 10000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS n
FROM sys.objects a
CROSS JOIN sys.objects b -- Cross join to get enough rows
) AS Numbers;
-- Demonstrate the index behavior
-- Good case: Single user (uses seek)
SELECT UserID, DtModified
FROM #UserActivities
WHERE UserID = 42
ORDER BY DtModified DESC;
-- Problem case: Multiple users (often uses scan)
SELECT UserID, DtModified
FROM #UserActivities
WHERE UserID IN (42, 100, 789, 1024)
ORDER BY DtModified DESC;
-- Clean up
DROP TABLE #UserActivities;
1 Answer 1
You need to match the ORDER BY
to the order of the index, to avoid the extra Sort
SELECT UserID, DtModified
FROM #UserActivities
WHERE UserID IN (42, 100, 789, 1024)
ORDER BY UserId ASC, DtModified DESC;
Alternatively, construct a query (dynamically or in your application) that contains a bunch of =
predicates with UNION ALL
. The optimizer will normally create a Merge Concatentation to keep the rows in the same order that they came from the Index Seek
SELECT UserID, DtModified
FROM #UserActivities
WHERE UserID = 42
UNION ALL
SELECT UserID, DtModified
FROM #UserActivities
WHERE UserID = 100
UNION ALL
SELECT UserID, DtModified
FROM #UserActivities
WHERE UserID = 789
UNION ALL
SELECT UserID, DtModified
FROM #UserActivities
WHERE UserID = 1024
ORDER BY
DtModified DESC;
A final option, which I found was less guaranteed than the others to get a good plan, is to change the IN
list to SELECT 1 UNION SELECT 2 UNION ...
(note the use of UNION
not UNION ALL
).
SELECT
ua.UserID,
ua.DtModified
FROM #UserActivities AS ua
WHERE UserID IN
(
SELECT 42
UNION
SELECT 100
UNION
SELECT 789
UNION
SELECT 1024
)
ORDER BY
ua.DtModified DESC;
This still doesn't help you with reading all those rows. If there are 10,000 rows and you only want the last 50, it's still going to have to read them all.
I strongly recommend you consider using Keyset Pagination, as it's far more efficient. It requires a rethink of the whole stack, in order to use some kind of infinite scroll, or by passing a LastKey
parameter to your API.
#UserActivities
And your "problem" plan actually has a seek.