I have a query that does a clustered index scan on a very large table, the scan is causing timeouts in some scenarios. Need help understanding why it is not using the defined nonclustered indexes.
Here is the query:
DECLARE @StartDate datetime = '2023-03-16 00:00:00';
DECLARE @TerminalIds [dbo].[udtBigInt]; -- user defined table with a BIGINT col
INSERT INTO @TerminalIds ([Id])
SELECT [EquipmentId]
FROM #mechanicsTerminal;
SELECT [DataRecId]
, [RawData]
, [RecordingTime]
, [EquipmentId]
FROM [dbo].[Data]
WHERE [EquipmentId] IN (SELECT [Id] FROM @TerminalIds)
AND [RecordingTime] >= @StartDate
ORDER BY [DataRecId] DESC
OFFSET 0 ROWS FETCH NEXT 50 ROWS ONLY;
Here is table definition:
CREATE TABLE [dbo].[Data](
[DataRecId] [bigint] IDENTITY(1,1) NOT NULL,
[RawData] [nvarchar](max) NOT NULL,
[CreatedDateUTC] [datetime] NOT NULL,
[RecordingTime] [datetime] NOT NULL,
[EquipmentId] [bigint] NOT NULL,
[DataSetId] [uniqueidentifier] NULL,
[SourceType] [nvarchar](50) NULL,
[Name] [nvarchar](100) NULL,
PRIMARY KEY CLUSTERED ( DataRecId] ASC)
GO
ALTER TABLE [EJ].[Data] WITH CHECK ADD CONSTRAINT [chk_Data_RawData] CHECK ((isjson([RawData])=(1)))
GO
Here are the indexes:
CREATE INDEX [nc_Data_DataSetId_includes]
ON [dbo].[Data] ( [DataSetId] ) INCLUDE ( [DataRecId], [RawData], [RecordingTime]);
GO
CREATE INDEX [nc_Data_EquipmentId_includes]
ON [dbo].[Data] ( [EquipmentId] ) INCLUDE ( [DataSetId], [RawData]);
GO
CREATE INDEX [nc_Data_EquipmentId_RecordingTime_Name_includes]
ON [dbo].[Data] ( [EquipmentId], [RecordingTime], [Name] ) INCLUDE ( [DataRecId], [RawData]);
GO
Here is the actual execution plan:
https://www.brentozar.com/pastetheplan/?id=B1oq7TDD3
With this particular data, the query executes sub-second.
However there is a scenario where there is only three records in @TerminalIds
, but no matching records in [dbo].[Data]
, the query never finishes. Here is the plan after 45 sec has passed.
https://www.brentozar.com/pastetheplan/?id=rJJMRavDn
What I've tried:
- Updating stats and recompiling the main proc
- Doing
INNER JOIN
on@TerminalIds
instead of doing subquery withIN
clause
1 Answer 1
The problem is that you are using an inequality on RecordingTime
, but an OFFSET FETCH
ordered by DataRecId
. The server seems to think that this ordering is more important, and will reduce the rows quicker, so it resorts to scanning the primary key index in the hope it will find the 50 rows quickly.
You can probably force it to read the nc_Data_EquipmentId_RecordingTime_Name_includes
index instead, by rewriting your query like this
SELECT DataRecId
, RawData
, RecordingTime
, EquipmentId
FROM (
SELECT TOP (1000000000) *
FROM dbo.Data
WHERE EquipmentId IN (SELECT Id FROM @TerminalIds)
AND RecordingTime >= @StartDate
ORDER BY
EquipmentId,
RecordingTime
) t
ORDER BY DataRecId DESC
OFFSET 0 ROWS FETCH NEXT 50 ROWS ONLY;
Assuming, as I suspect, you are using an ORM such as Entity Framework, you probably want something like this
context.Data
.Where(d => TerminalIds.Contains(d.EquipmentId) && d.RecordingTime >= StartDate)
.OrderBy(d => d.EquipmentId)
.ThenBy(d => d.RecordingTime)
.Take(1000000000)
.OrderByDescending(d => d.DataRecId)
.Take(50)
If you can remove the requirement to sort by DataRecId
in the first place then you will probably significantly improve the chances of the correct index being selected.
You should also add a primary key to the Table Type. This will remove the Sort and Spool from the plan.
DROP TYPE dbo.udtBigInt;
CREATE TYPE dbo.udtBigInt (Id bigint PRIMARY KEY);
-
This is written in dynamic sql via stored proc. We default to ordering by the Primary key since it is same order as RecordingTime. However when we change the default to sort by RecordingTime, it did improve performance drastically, now it is sub-second, although with a parallelism plan, it is now using the nc index.Greg– Greg2023年06月15日 13:58:51 +00:00Commented Jun 15, 2023 at 13:58
ORDER BY [DataRecId] DESC
can you order byRecordingTime
instead?