Sql Perf - why is query doing clustered index scan as opposed to using defined nonclustered indexes

Question 1

I have a query that does a clustered index scan on a very large table, the scan is causing timeouts in some scenarios. Need help understanding why it is not using the defined nonclustered indexes.

Here is the query:

DECLARE @StartDate datetime = '2023-03-16 00:00:00';
DECLARE @TerminalIds [dbo].[udtBigInt]; -- user defined table with a BIGINT col
INSERT INTO @TerminalIds ([Id])
SELECT [EquipmentId]
FROM #mechanicsTerminal;
SELECT [DataRecId]
 , [RawData]
 , [RecordingTime]
 , [EquipmentId]
FROM [dbo].[Data]
WHERE [EquipmentId] IN (SELECT [Id] FROM @TerminalIds)
AND [RecordingTime] >= @StartDate
ORDER BY [DataRecId] DESC
OFFSET 0 ROWS FETCH NEXT 50 ROWS ONLY;

Here is table definition:

CREATE TABLE [dbo].[Data](
 [DataRecId] [bigint] IDENTITY(1,1) NOT NULL,
 [RawData] [nvarchar](max) NOT NULL,
 [CreatedDateUTC] [datetime] NOT NULL,
 [RecordingTime] [datetime] NOT NULL,
 [EquipmentId] [bigint] NOT NULL,
 [DataSetId] [uniqueidentifier] NULL,
 [SourceType] [nvarchar](50) NULL,
 [Name] [nvarchar](100) NULL,
PRIMARY KEY CLUSTERED ( DataRecId] ASC)
GO
ALTER TABLE [EJ].[Data] WITH CHECK ADD CONSTRAINT [chk_Data_RawData] CHECK ((isjson([RawData])=(1)))
GO

Here are the indexes:

CREATE INDEX [nc_Data_DataSetId_includes] 
ON [dbo].[Data] ( [DataSetId] ) INCLUDE ( [DataRecId], [RawData], [RecordingTime]);
GO
CREATE INDEX [nc_Data_EquipmentId_includes] 
ON [dbo].[Data] ( [EquipmentId] ) INCLUDE ( [DataSetId], [RawData]);
GO
CREATE INDEX [nc_Data_EquipmentId_RecordingTime_Name_includes] 
ON [dbo].[Data] ( [EquipmentId], [RecordingTime], [Name] ) INCLUDE ( [DataRecId], [RawData]);
GO

Here is the actual execution plan:

https://www.brentozar.com/pastetheplan/?id=B1oq7TDD3

With this particular data, the query executes sub-second.

However there is a scenario where there is only three records in @TerminalIds, but no matching records in [dbo].[Data], the query never finishes. Here is the plan after 45 sec has passed.

https://www.brentozar.com/pastetheplan/?id=rJJMRavDn

What I've tried:

Updating stats and recompiling the main proc
Doing INNER JOIN on @TerminalIds instead of doing subquery with IN clause

Question 2

Do you need ORDER BY [DataRecId] DESC can you order by RecordingTime instead?

Question 3

@Charlieface no difference

Question 4

@Charlieface, i take it back, it did improve performance. it used the nc index, but invoked parallelism, either way, it's sub-second now.

Question 5

The problem is that you are using an inequality on RecordingTime, but an OFFSET FETCH ordered by DataRecId. The server seems to think that this ordering is more important, and will reduce the rows quicker, so it resorts to scanning the primary key index in the hope it will find the 50 rows quickly.

You can probably force it to read the nc_Data_EquipmentId_RecordingTime_Name_includes index instead, by rewriting your query like this

SELECT DataRecId
 , RawData
 , RecordingTime
 , EquipmentId
FROM (
 SELECT TOP (1000000000) *
 FROM dbo.Data
 WHERE EquipmentId IN (SELECT Id FROM @TerminalIds)
 AND RecordingTime >= @StartDate
 ORDER BY
 EquipmentId,
 RecordingTime
) t
ORDER BY DataRecId DESC
OFFSET 0 ROWS FETCH NEXT 50 ROWS ONLY;

Assuming, as I suspect, you are using an ORM such as Entity Framework, you probably want something like this

context.Data
 .Where(d => TerminalIds.Contains(d.EquipmentId) && d.RecordingTime >= StartDate)
 .OrderBy(d => d.EquipmentId)
 .ThenBy(d => d.RecordingTime)
 .Take(1000000000)
 .OrderByDescending(d => d.DataRecId)
 .Take(50)

If you can remove the requirement to sort by DataRecId in the first place then you will probably significantly improve the chances of the correct index being selected.

You should also add a primary key to the Table Type. This will remove the Sort and Spool from the plan.

DROP TYPE dbo.udtBigInt;
CREATE TYPE dbo.udtBigInt (Id bigint PRIMARY KEY);

Question 6

This is written in dynamic sql via stored proc. We default to ordering by the Primary key since it is same order as RecordingTime. However when we change the default to sort by RecordingTime, it did improve performance drastically, now it is sub-second, although with a parallelism plan, it is now using the nc index.

Charlieface Charlieface 17.6k22 silver badges45 bronze badges · Accepted Answer · 2023-06-15 01:37:09Z

The problem is that you are using an inequality on RecordingTime, but an OFFSET FETCH ordered by DataRecId. The server seems to think that this ordering is more important, and will reduce the rows quicker, so it resorts to scanning the primary key index in the hope it will find the 50 rows quickly.

You can probably force it to read the nc_Data_EquipmentId_RecordingTime_Name_includes index instead, by rewriting your query like this

SELECT DataRecId
 , RawData
 , RecordingTime
 , EquipmentId
FROM (
 SELECT TOP (1000000000) *
 FROM dbo.Data
 WHERE EquipmentId IN (SELECT Id FROM @TerminalIds)
 AND RecordingTime >= @StartDate
 ORDER BY
 EquipmentId,
 RecordingTime
) t
ORDER BY DataRecId DESC
OFFSET 0 ROWS FETCH NEXT 50 ROWS ONLY;

Assuming, as I suspect, you are using an ORM such as Entity Framework, you probably want something like this

context.Data
 .Where(d => TerminalIds.Contains(d.EquipmentId) && d.RecordingTime >= StartDate)
 .OrderBy(d => d.EquipmentId)
 .ThenBy(d => d.RecordingTime)
 .Take(1000000000)
 .OrderByDescending(d => d.DataRecId)
 .Take(50)

If you can remove the requirement to sort by DataRecId in the first place then you will probably significantly improve the chances of the correct index being selected.

You should also add a primary key to the Table Type. This will remove the Sort and Spool from the plan.

DROP TYPE dbo.udtBigInt;
CREATE TYPE dbo.udtBigInt (Id bigint PRIMARY KEY);

This is written in dynamic sql via stored proc. We default to ordering by the Primary key since it is same order as RecordingTime. However when we change the default to sort by RecordingTime, it did improve performance drastically, now it is sub-second, although with a parallelism plan, it is now using the nc index.

Stack Exchange Network

Sql Perf - why is query doing clustered index scan as opposed to using defined nonclustered indexes

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Sql Perf - why is query doing clustered index scan as opposed to using defined nonclustered indexes

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions