1

I have a query that does a clustered index scan on a very large table, the scan is causing timeouts in some scenarios. Need help understanding why it is not using the defined nonclustered indexes.

Here is the query:

DECLARE @StartDate datetime = '2023-03-16 00:00:00';
DECLARE @TerminalIds [dbo].[udtBigInt]; -- user defined table with a BIGINT col
INSERT INTO @TerminalIds ([Id])
SELECT [EquipmentId]
FROM #mechanicsTerminal;
SELECT [DataRecId]
 , [RawData]
 , [RecordingTime]
 , [EquipmentId]
FROM [dbo].[Data]
WHERE [EquipmentId] IN (SELECT [Id] FROM @TerminalIds)
AND [RecordingTime] >= @StartDate
ORDER BY [DataRecId] DESC
OFFSET 0 ROWS FETCH NEXT 50 ROWS ONLY;

Here is table definition:

CREATE TABLE [dbo].[Data](
 [DataRecId] [bigint] IDENTITY(1,1) NOT NULL,
 [RawData] [nvarchar](max) NOT NULL,
 [CreatedDateUTC] [datetime] NOT NULL,
 [RecordingTime] [datetime] NOT NULL,
 [EquipmentId] [bigint] NOT NULL,
 [DataSetId] [uniqueidentifier] NULL,
 [SourceType] [nvarchar](50) NULL,
 [Name] [nvarchar](100) NULL,
PRIMARY KEY CLUSTERED ( DataRecId] ASC)
GO
ALTER TABLE [EJ].[Data] WITH CHECK ADD CONSTRAINT [chk_Data_RawData] CHECK ((isjson([RawData])=(1)))
GO

Here are the indexes:

CREATE INDEX [nc_Data_DataSetId_includes] 
ON [dbo].[Data] ( [DataSetId] ) INCLUDE ( [DataRecId], [RawData], [RecordingTime]);
GO
CREATE INDEX [nc_Data_EquipmentId_includes] 
ON [dbo].[Data] ( [EquipmentId] ) INCLUDE ( [DataSetId], [RawData]);
GO
CREATE INDEX [nc_Data_EquipmentId_RecordingTime_Name_includes] 
ON [dbo].[Data] ( [EquipmentId], [RecordingTime], [Name] ) INCLUDE ( [DataRecId], [RawData]);
GO

Here is the actual execution plan:

https://www.brentozar.com/pastetheplan/?id=B1oq7TDD3

With this particular data, the query executes sub-second.

However there is a scenario where there is only three records in @TerminalIds, but no matching records in [dbo].[Data], the query never finishes. Here is the plan after 45 sec has passed.

https://www.brentozar.com/pastetheplan/?id=rJJMRavDn

What I've tried:

  • Updating stats and recompiling the main proc
  • Doing INNER JOIN on @TerminalIds instead of doing subquery with IN clause
asked Jun 14, 2023 at 23:43
3
  • 1
    Do you need ORDER BY [DataRecId] DESC can you order by RecordingTime instead? Commented Jun 15, 2023 at 1:28
  • @Charlieface no difference Commented Jun 15, 2023 at 13:45
  • @Charlieface, i take it back, it did improve performance. it used the nc index, but invoked parallelism, either way, it's sub-second now. Commented Jun 15, 2023 at 13:57

1 Answer 1

1

The problem is that you are using an inequality on RecordingTime, but an OFFSET FETCH ordered by DataRecId. The server seems to think that this ordering is more important, and will reduce the rows quicker, so it resorts to scanning the primary key index in the hope it will find the 50 rows quickly.

You can probably force it to read the nc_Data_EquipmentId_RecordingTime_Name_includes index instead, by rewriting your query like this

SELECT DataRecId
 , RawData
 , RecordingTime
 , EquipmentId
FROM (
 SELECT TOP (1000000000) *
 FROM dbo.Data
 WHERE EquipmentId IN (SELECT Id FROM @TerminalIds)
 AND RecordingTime >= @StartDate
 ORDER BY
 EquipmentId,
 RecordingTime
) t
ORDER BY DataRecId DESC
OFFSET 0 ROWS FETCH NEXT 50 ROWS ONLY;

Assuming, as I suspect, you are using an ORM such as Entity Framework, you probably want something like this

context.Data
 .Where(d => TerminalIds.Contains(d.EquipmentId) && d.RecordingTime >= StartDate)
 .OrderBy(d => d.EquipmentId)
 .ThenBy(d => d.RecordingTime)
 .Take(1000000000)
 .OrderByDescending(d => d.DataRecId)
 .Take(50)

If you can remove the requirement to sort by DataRecId in the first place then you will probably significantly improve the chances of the correct index being selected.


You should also add a primary key to the Table Type. This will remove the Sort and Spool from the plan.

DROP TYPE dbo.udtBigInt;
CREATE TYPE dbo.udtBigInt (Id bigint PRIMARY KEY);
answered Jun 15, 2023 at 1:37
1
  • This is written in dynamic sql via stored proc. We default to ordering by the Primary key since it is same order as RecordingTime. However when we change the default to sort by RecordingTime, it did improve performance drastically, now it is sub-second, although with a parallelism plan, it is now using the nc index. Commented Jun 15, 2023 at 13:58

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.