The query
I had a specific query I wanted to optimize:
SELECT * /*12 columns*/
FROM [dbo].[EnterpriseGroup]
WHERE
(EnterpriseGroup.ChildId = 123 OR EnterpriseGroup.FatherId = 234)
AND StatusCd >= 2
There is also already an index on FatherId
, but not on the ChildId
. The primary key is among the 12 selected columns, but none of those used in the WHERE clause here.
The use
This is a simple query but it's run very, very often during daily work. The table is also small, around 8000 rows.
The query is used to find groups of enterprises. There are about 2 Millions of enterprise entries, so only less than 0.5% have a matching group row and thus most of the time, no group will be found.
The recommendation
When using SSMS and inspecting the "Actual Execution Plan" it gives this plan:
Execution Plan without index existing and recommendation
Table Scan Hover Info, when querying without index.
The predicate shown is actually the WHERE clause.
And, it recommends to create an index, which is basically indexing on the WHERE clause and adding all the queried columns directly into the index. Seems not very clever to me but maybe this is what this questions is all about:
/*
Missing Index Details from SQLQuery6.sql .....
The Query Processor estimates that implementing the following index could improve the query cost by 68.9052%.
*/
/*
USE [...]
GO
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[EnterpriseGroup] ([StatusCd])
INCLUDE (....all the 12 queried columns......)
GO
*/
The result
After I create the recommended index I get the following plan:
Execution Plan with index existing
It's not used at all! (And the Table Scan hover info is exactly the same, still having a copy of the WHERE Clause as predicate)
The Question
Why does SQL Server not use an existing index that the SSMS adviced to create?
If not, then what is the significance of a missing index recommendation in the execution plan of SSMS (SQL Server Management Studio) against a Microsoft SQL server?
The notes
Note: I am no DBA, but a software developer. I have read a bit into this, including: https://www.brentozar.com/archive/2013/07/dude-who-stole-my-missing-index-recommendation/ but I did not clarify to me.
Note: In case it matters: - SQL Server Version 11.0.7493.4, running on Windows NT 6.3. - Microsoft SQL Server Management Studio is Version 11.0.7493.4
3 Answers 3
Assuming you WHERE
condition still utilizes ChildId
, FatherId
and StatusCd
. It is possible that from a statistics perspective, that either ChildId
or FatherId
is more selective. Taking ChildId
and FatherId
out of the WHERE
clause should result in that new index being used, since StatusCd
is the indexed column
in the index
.
If you hover over the Table Scan section you should see something like the below image:ExecutionPlanStepDetail
It is possible that even though it recommended creating that index, that it is still querying by either ChildId
or FatherId
.
It would do this if ChildId
or FatherId
is more selective. Lets say StatusCd >= 2
returns 6,000 of the 8,000 rows. But either ChildId = 123
or FatherId = 234
only has 1 row. Then doing a table scan on that one column, and applying the rest of the conditions after the fact is a more efficient query plan (theoretically speaking) than returning all 6,000 rows from StatusCd >= 2
and trying to apply the ChildId
or FatherId
conditions after.
This was something I learned from a question I asked a little bit ago. The guy who answered it had a really great way of explaining what I tried to say here. Does the Query Optimizer Prefer to Query on Constants before Columns?
Hopefully that helps.
-
This is very insightful. Your assumptions are correct, and I updated the question to include this information. Given, the search for the Id's is very selective, why would it not recommend an index over these? But I understand that a table scan might just be as effective, so why would it recommend an index in the first place?Marcel– Marcel2020年04月03日 06:33:37 +00:00Commented Apr 3, 2020 at 6:33
-
I added the table scan hover info (without the 1:1 predicate, because I simplified some namings for the question. But it's basically a copy of the WHERE clause.Marcel– Marcel2020年04月03日 12:28:07 +00:00Commented Apr 3, 2020 at 12:28
-
I think in general, the reason it would recommend an index and then not use it, is because those recommended indexes are not always the best recommendations. I don't want to speculate on what the decision logic is but Bent Ozar doesn't really like the logic it uses and he discusses that here: brentozar.com/archive/2019/11/….Kirk Saunders– Kirk Saunders2020年04月03日 13:57:56 +00:00Commented Apr 3, 2020 at 13:57
There are many unknown variables for us to answer correctly, for example, the table size, statistics on your columns and existing indexes.
First of all, your table is a heap (as opposed to clustered index) if it's updated frequently, it might be fragmented with forwarding pointers. That might negatively affect the number of reads the engine has to do.
Second, can you hover over the table scan operator in the actual query plan? It will shows us the predicate - there might be a simple or forced parametrization going on, which won't use statistics for your value.
Third, what are the indexes? Do you need to select all columns (select star)? Do you have index on childId or fatherId?
My recommended index (without knowing the statistics would be 1. childId, StatusCd Include (all columns from your select clause) 2. FatherId, StatusCd Include (all columns from your select clause)
and then I would rewrite the query as following:
SELECT [columns]
FROM [dbo].[EnterpriseGroup]
WHERE
EnterpriseGroup.ChildId = 123
AND StatusCd >= 2
UNION ALL
SELECT [columns]
FROM [dbo].[EnterpriseGroup]
WHERE
EnterpriseGroup.FatherId = 234
AND StatusCd >= 2
You will seek twice into same table, once using index on Child, and second time using index on Father and then concatenate the results.
-
Why do you think seeking twice might be more effective than doing a table scan once? Is there a real difference between seek and scan here?Marcel– Marcel2020年04月03日 12:03:40 +00:00Commented Apr 3, 2020 at 12:03
-
About parametrization: The Id's for child and father are given as parameters in the actual query, not fixed numbers of course. And there is an existing index on the fatherId (not the childId)Marcel– Marcel2020年04月03日 12:09:25 +00:00Commented Apr 3, 2020 at 12:09
-
Really depends on your stats. You said the table has 8k rows, how many rows does your query return. What precentage is found using Father and what precentage is found using Child? From the perspective of the optimizer, you want to find either rows that have ChildId or FatherId - not one index will fullfil this, so the optimizer tries to recommend StatusCd which is common for both, but if it's a column with low selectivity, it might be better to scan the whole table and discard some rows.Zikato– Zikato2020年04月03日 13:22:31 +00:00Commented Apr 3, 2020 at 13:22
-
If you scan whole table thats 8k reads not counting forwarding pointers. But if out of 8k rows, 15 have FatherId and 30 have ChildId: Your index will probably have 2 levels and either include all columns from select clause or use a RID lookup. That's 2 or 3 * (15+30 seeks) = 90 or 135 reads. Seems much better.Zikato– Zikato2020年04月03日 13:28:44 +00:00Commented Apr 3, 2020 at 13:28
-
All rows in this table have both ChildId and FatherId set. It's just that the used values of these Id's most often do not produce a result. (Think: very few enterprises are in a group in the first place, but I must always check, using their Id's. And one enterprise may be in any number of groups.)Marcel– Marcel2020年04月03日 13:39:19 +00:00Commented Apr 3, 2020 at 13:39
Give this a try:
SELECT * /*12 columns*/
FROM [dbo].[EnterpriseGroup]
WHERE EnterpriseGroup.ChildId = 123
AND StatusCd >= 2
UNION
SELECT * /*12 columns*/
FROM [dbo].[EnterpriseGroup]
WHERE EnterpriseGroup.FatherId = 234
AND StatusCd >= 2
And have these two composite indexes (with the columns in the order given):
(ChildId, StatusCd)
(FatherId, StatusCd)
Explore related questions
See similar questions with these tags.
WHERE StatusCd >= 2
? An index is only useful if it can effectively eliminate enough rows that using it is better than a full scan. In many cases, the scan is actually more efficient, especially if - like Kirk says - the other predicates are more selective than the leading key in the index (another factor isSELECT *
means a key lookup would be required to obtain the other columns - so if you don't really needSELECT *
, trying again with the real set of columns you need may yield to a more useful and more likely-to-be-used index suggestion).StatusCd
if it is not going use it for the actual execution. I think in general it is just because those recommendations are not always the greatest. Brent Ozar shares his opinions on the recommended indexes from SQL Server here: brentozar.com/archive/2019/11/…. (someone else very smart, like @AaronBertrand might know more). For this particular query, the most likely reason it is not utilizing the index onStatusCd
is because of the selectivity those 3 columns.