Why does SQL Server not use the Index it recommends?

Question 1

The query

I had a specific query I wanted to optimize:

SELECT * /*12 columns*/
FROM [dbo].[EnterpriseGroup]
WHERE 
 (EnterpriseGroup.ChildId = 123 OR EnterpriseGroup.FatherId = 234)
AND StatusCd >= 2

There is also already an index on FatherId, but not on the ChildId. The primary key is among the 12 selected columns, but none of those used in the WHERE clause here.

The use

This is a simple query but it's run very, very often during daily work. The table is also small, around 8000 rows.

The query is used to find groups of enterprises. There are about 2 Millions of enterprise entries, so only less than 0.5% have a matching group row and thus most of the time, no group will be found.

The recommendation

When using SSMS and inspecting the "Actual Execution Plan" it gives this plan:

Execution Plan without index existing and recommendation

Table Scan Hover Info, when querying without index.

The predicate shown is actually the WHERE clause.

And, it recommends to create an index, which is basically indexing on the WHERE clause and adding all the queried columns directly into the index. Seems not very clever to me but maybe this is what this questions is all about:

/*
Missing Index Details from SQLQuery6.sql .....
The Query Processor estimates that implementing the following index could improve the query cost by 68.9052%.
*/
/*
USE [...]
GO
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[EnterpriseGroup] ([StatusCd])
INCLUDE (....all the 12 queried columns......)
GO
*/

The result

After I create the recommended index I get the following plan:

Execution Plan with index existing

It's not used at all! (And the Table Scan hover info is exactly the same, still having a copy of the WHERE Clause as predicate)

The Question

Why does SQL Server not use an existing index that the SSMS adviced to create?

If not, then what is the significance of a missing index recommendation in the execution plan of SSMS (SQL Server Management Studio) against a Microsoft SQL server?

The notes

Note: I am no DBA, but a software developer. I have read a bit into this, including: https://www.brentozar.com/archive/2013/07/dude-who-stole-my-missing-index-recommendation/ but I did not clarify to me.

Note: In case it matters: - SQL Server Version 11.0.7493.4, running on Windows NT 6.3. - Microsoft SQL Server Management Studio is Version 11.0.7493.4

Question 2

How many rows in the table? How many rows match WHERE StatusCd >= 2? An index is only useful if it can effectively eliminate enough rows that using it is better than a full scan. In many cases, the scan is actually more efficient, especially if - like Kirk says - the other predicates are more selective than the leading key in the index (another factor is SELECT * means a key lookup would be required to obtain the other columns - so if you don't really need SELECT *, trying again with the real set of columns you need may yield to a more useful and more likely-to-be-used index suggestion).

Question 3

@AaronBertrand The number of existing rows is around 8k. Most rows match StatusCd >=2, but not all at all the times. The actually issued query uses an explicitly defined columns set, but it so happens that these are all the currently existing columns anyway.

Question 4

@AaronBertrand I have also to say, most of the time there will be no match at all. So I am testing an edge case in this regard. Would this affect the index usage?

Question 5

Marcel, I am not sure why SQL Server would recommend an index on StatusCd if it is not going use it for the actual execution. I think in general it is just because those recommendations are not always the greatest. Brent Ozar shares his opinions on the recommended indexes from SQL Server here: brentozar.com/archive/2019/11/…. (someone else very smart, like @AaronBertrand might know more). For this particular query, the most likely reason it is not utilizing the index on StatusCd is because of the selectivity those 3 columns.

Question 6

Assuming you WHERE condition still utilizes ChildId, FatherId and StatusCd. It is possible that from a statistics perspective, that either ChildId or FatherId is more selective. Taking ChildId and FatherId out of the WHERE clause should result in that new index being used, since StatusCd is the indexed column in the index.

If you hover over the Table Scan section you should see something like the below image:ExecutionPlanStepDetail

It is possible that even though it recommended creating that index, that it is still querying by either ChildId or FatherId.

It would do this if ChildId or FatherId is more selective. Lets say StatusCd >= 2 returns 6,000 of the 8,000 rows. But either ChildId = 123 or FatherId = 234 only has 1 row. Then doing a table scan on that one column, and applying the rest of the conditions after the fact is a more efficient query plan (theoretically speaking) than returning all 6,000 rows from StatusCd >= 2 and trying to apply the ChildId or FatherId conditions after.

This was something I learned from a question I asked a little bit ago. The guy who answered it had a really great way of explaining what I tried to say here. Does the Query Optimizer Prefer to Query on Constants before Columns?

Hopefully that helps.

Question 7

This is very insightful. Your assumptions are correct, and I updated the question to include this information. Given, the search for the Id's is very selective, why would it not recommend an index over these? But I understand that a table scan might just be as effective, so why would it recommend an index in the first place?

Question 8

I added the table scan hover info (without the 1:1 predicate, because I simplified some namings for the question. But it's basically a copy of the WHERE clause.

Question 9

I think in general, the reason it would recommend an index and then not use it, is because those recommended indexes are not always the best recommendations. I don't want to speculate on what the decision logic is but Bent Ozar doesn't really like the logic it uses and he discusses that here: brentozar.com/archive/2019/11/….

Question 10

There are many unknown variables for us to answer correctly, for example, the table size, statistics on your columns and existing indexes.

First of all, your table is a heap (as opposed to clustered index) if it's updated frequently, it might be fragmented with forwarding pointers. That might negatively affect the number of reads the engine has to do.

Second, can you hover over the table scan operator in the actual query plan? It will shows us the predicate - there might be a simple or forced parametrization going on, which won't use statistics for your value.

Third, what are the indexes? Do you need to select all columns (select star)? Do you have index on childId or fatherId?

My recommended index (without knowing the statistics would be 1. childId, StatusCd Include (all columns from your select clause) 2. FatherId, StatusCd Include (all columns from your select clause)

and then I would rewrite the query as following:

SELECT [columns] 
FROM [dbo].[EnterpriseGroup]
WHERE 
 EnterpriseGroup.ChildId = 123 
 AND StatusCd >= 2
UNION ALL
SELECT [columns] 
FROM [dbo].[EnterpriseGroup]
WHERE 
 EnterpriseGroup.FatherId = 234
 AND StatusCd >= 2

You will seek twice into same table, once using index on Child, and second time using index on Father and then concatenate the results.

Question 11

Why do you think seeking twice might be more effective than doing a table scan once? Is there a real difference between seek and scan here?

Question 12

About parametrization: The Id's for child and father are given as parameters in the actual query, not fixed numbers of course. And there is an existing index on the fatherId (not the childId)

Question 13

Really depends on your stats. You said the table has 8k rows, how many rows does your query return. What precentage is found using Father and what precentage is found using Child? From the perspective of the optimizer, you want to find either rows that have ChildId or FatherId - not one index will fullfil this, so the optimizer tries to recommend StatusCd which is common for both, but if it's a column with low selectivity, it might be better to scan the whole table and discard some rows.

Question 14

If you scan whole table thats 8k reads not counting forwarding pointers. But if out of 8k rows, 15 have FatherId and 30 have ChildId: Your index will probably have 2 levels and either include all columns from select clause or use a RID lookup. That's 2 or 3 * (15+30 seeks) = 90 or 135 reads. Seems much better.

Question 15

All rows in this table have both ChildId and FatherId set. It's just that the used values of these Id's most often do not produce a result. (Think: very few enterprises are in a group in the first place, but I must always check, using their Id's. And one enterprise may be in any number of groups.)

Question 16

Give this a try:

SELECT * /*12 columns*/
FROM [dbo].[EnterpriseGroup]
WHERE EnterpriseGroup.ChildId = 123
AND StatusCd >= 2
UNION
SELECT * /*12 columns*/
FROM [dbo].[EnterpriseGroup]
WHERE EnterpriseGroup.FatherId = 234
AND StatusCd >= 2

And have these two composite indexes (with the columns in the order given):

(ChildId, StatusCd)
(FatherId, StatusCd)

Kirk Saunders Kirk Saunders 1,0966 silver badges19 bronze badges · Answer 1 · 2020-04-02 15:34:06Z

Assuming you WHERE condition still utilizes ChildId, FatherId and StatusCd. It is possible that from a statistics perspective, that either ChildId or FatherId is more selective. Taking ChildId and FatherId out of the WHERE clause should result in that new index being used, since StatusCd is the indexed column in the index.

If you hover over the Table Scan section you should see something like the below image:ExecutionPlanStepDetail

It is possible that even though it recommended creating that index, that it is still querying by either ChildId or FatherId.

It would do this if ChildId or FatherId is more selective. Lets say StatusCd >= 2 returns 6,000 of the 8,000 rows. But either ChildId = 123 or FatherId = 234 only has 1 row. Then doing a table scan on that one column, and applying the rest of the conditions after the fact is a more efficient query plan (theoretically speaking) than returning all 6,000 rows from StatusCd >= 2 and trying to apply the ChildId or FatherId conditions after.

This was something I learned from a question I asked a little bit ago. The guy who answered it had a really great way of explaining what I tried to say here. Does the Query Optimizer Prefer to Query on Constants before Columns?

Hopefully that helps.

This is very insightful. Your assumptions are correct, and I updated the question to include this information. Given, the search for the Id's is very selective, why would it not recommend an index over these? But I understand that a table scan might just be as effective, so why would it recommend an index in the first place?
I added the table scan hover info (without the 1:1 predicate, because I simplified some namings for the question. But it's basically a copy of the WHERE clause.
I think in general, the reason it would recommend an index and then not use it, is because those recommended indexes are not always the best recommendations. I don't want to speculate on what the decision logic is but Bent Ozar doesn't really like the logic it uses and he discusses that here: brentozar.com/archive/2019/11/….

Zikato Zikato 5,7241 gold badge17 silver badges34 bronze badges · Answer 2 · 2020-04-03 08:28:15Z

There are many unknown variables for us to answer correctly, for example, the table size, statistics on your columns and existing indexes.

First of all, your table is a heap (as opposed to clustered index) if it's updated frequently, it might be fragmented with forwarding pointers. That might negatively affect the number of reads the engine has to do.

Second, can you hover over the table scan operator in the actual query plan? It will shows us the predicate - there might be a simple or forced parametrization going on, which won't use statistics for your value.

Third, what are the indexes? Do you need to select all columns (select star)? Do you have index on childId or fatherId?

My recommended index (without knowing the statistics would be 1. childId, StatusCd Include (all columns from your select clause) 2. FatherId, StatusCd Include (all columns from your select clause)

and then I would rewrite the query as following:

SELECT [columns] 
FROM [dbo].[EnterpriseGroup]
WHERE 
 EnterpriseGroup.ChildId = 123 
 AND StatusCd >= 2
UNION ALL
SELECT [columns] 
FROM [dbo].[EnterpriseGroup]
WHERE 
 EnterpriseGroup.FatherId = 234
 AND StatusCd >= 2

You will seek twice into same table, once using index on Child, and second time using index on Father and then concatenate the results.

Why do you think seeking twice might be more effective than doing a table scan once? Is there a real difference between seek and scan here?
About parametrization: The Id's for child and father are given as parameters in the actual query, not fixed numbers of course. And there is an existing index on the fatherId (not the childId)
Really depends on your stats. You said the table has 8k rows, how many rows does your query return. What precentage is found using Father and what precentage is found using Child? From the perspective of the optimizer, you want to find either rows that have ChildId or FatherId - not one index will fullfil this, so the optimizer tries to recommend StatusCd which is common for both, but if it's a column with low selectivity, it might be better to scan the whole table and discard some rows.
If you scan whole table thats 8k reads not counting forwarding pointers. But if out of 8k rows, 15 have FatherId and 30 have ChildId: Your index will probably have 2 levels and either include all columns from select clause or use a RID lookup. That's 2 or 3 * (15+30 seeks) = 90 or 135 reads. Seems much better.
All rows in this table have both ChildId and FatherId set. It's just that the used values of these Id's most often do not produce a result. (Think: very few enterprises are in a group in the first place, but I must always check, using their Id's. And one enterprise may be in any number of groups.)

Rick James Rick James 80.7k5 gold badges52 silver badges119 bronze badges · Answer 3 · 2020-05-24 20:13:59Z

Give this a try:

SELECT * /*12 columns*/
FROM [dbo].[EnterpriseGroup]
WHERE EnterpriseGroup.ChildId = 123
AND StatusCd >= 2
UNION
SELECT * /*12 columns*/
FROM [dbo].[EnterpriseGroup]
WHERE EnterpriseGroup.FatherId = 234
AND StatusCd >= 2

And have these two composite indexes (with the columns in the order given):

(ChildId, StatusCd)
(FatherId, StatusCd)

Stack Exchange Network

Why does SQL Server not use the Index it recommends?

The query

The use

The recommendation

The result

The Question

The notes

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Why does SQL Server not use the Index it recommends?

The query

The use

The recommendation

The result

The Question

The notes

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions