I am absolutely stumped as to why my query is not using what I think is a selective index.
My Model consists of Claims, Contacts, and Phone Numbers. Each Claim has 1 Contact and Each Contact has Many Phone numbers. A Claim can have a Status and a Phone Number has a Type. Simplified Model
I have added an index on the Claim for the Status and it includes the ContactID.
create index Status on tClaim(Status) include (Name,ContactID)
I have added an index on the Phones for the ContactID and Type that includes the Number.
create index ContactID_Type on tContactPhone(ContactID,Type) include (Number)
I am trying to write a query that returns all Claims that have a status of 'Won' and the corresponding 'Home Phone' for the Claim's Contact. I have tried it 2 ways. One including the join to Contacts and one without. Neither generate a plan that I expect.
select
c.ID,
c.Name,
p.Number
from
tClaim c
left join tContactPhone p on
c.ContactID=p.ContactID and p.Type='Home'
where
c.Status = 'Won'
select
c.ID,
c.Name,
p.Number
from
tClaim c
inner join tContact co on
co.id=c.ContactID
left join tContactPhone p on
co.ID=p.ContactID and p.Type='Home'
where
c.Status = 'Won'
The plan I am getting back, refuses to use the tContactPhone.ContactID_Type. It suggests indexing by Type, which doesn't make sense, because it seems less selective than the ContactId.
Here is the script I used to create a sample data set to test. Please note my actual data set is much larger, named better, and has a lot more fields; but this is distilled down to replicate my situation [AKA I don't even like the naming conventions and data generation, but it gets the job done :)]
/*
Create Tables and Constraints
*/
CREATE TABLE tContact(
[ID] [int] IDENTITY(1,1) NOT NULL,
[Name] [nvarchar](100) NOT NULL,
CONSTRAINT [pkey_tContact] PRIMARY KEY CLUSTERED
(
[ID] ASC
)
)
GO
CREATE TABLE tContactPhone(
[ID] [int] IDENTITY(1,1) NOT NULL,
[ContactID] [int] NOT NULL,
[Type] [nvarchar](25) NOT NULL,
[Number] [nvarchar](12) NOT NULL,
CONSTRAINT [pkey_tContactPhone] PRIMARY KEY CLUSTERED
(
[ID] ASC
)
)
GO
ALTER TABLE tContactPhone WITH CHECK ADD CONSTRAINT FK_tContactPhones FOREIGN KEY(ContactID)
REFERENCES tContact ([ID])
GO
ALTER TABLE tContactPhone CHECK CONSTRAINT FK_tContactPhones
GO
CREATE TABLE tClaim(
[ID] [int] IDENTITY(1,1) NOT NULL,
[ContactID] [int] NOT NULL,
[Name] [nvarchar](50) NOT NULL,
[Status] nvarchar(10) not null,
CONSTRAINT [pkey_tClaim] PRIMARY KEY CLUSTERED
(
[ID] ASC
)
)
GO
ALTER TABLE tClaim WITH CHECK ADD CONSTRAINT FK_tClaim FOREIGN KEY(ContactID)
REFERENCES tContact ([ID])
GO
/*
Add Test Data
*/
declare @Count int = 0
declare @ContactID int =0
while(@Count<100000)
begin
set @Count = @Count+1
insert into tContact(Name)
select 'Name' + convert(nvarchar(10),@Count)
set @ContactID= SCOPE_IDENTITY()
insert into tContactPhone(ContactID,Number,Type)
select @ContactID,@Count+1,'Home'
union select @ContactID,@Count+1,'Cell'
insert into tClaim(ContactID,Name,Status)
select @ContactID, convert(nvarchar(10),@ContactID)+'_ClaimName',case @Count % 25 when 0 then 'Won' else 'Closed' end
end
/*
Add Indexes for Queries
*/
create index Status on tClaim(Status) include (Name,ContactID)
create index ContactID_Type on tContactPhone(ContactID,Type) include (Number)
2 Answers 2
You can get almost as good a plan with fewer indexes if you cluster tContactPhone by (ContactID,ID) instead of having a clustered index on ID and a seperate non-clustered index on ContactID. eg
CREATE TABLE tContactPhone(
[ContactID] [int] NOT NULL,
[ID] [int] IDENTITY(1,1) NOT NULL,
[Type] [nvarchar](25) NOT NULL,
[Number] [nvarchar](12) NOT NULL,
CONSTRAINT [pkey_tContactPhone] PRIMARY KEY CLUSTERED
(
[ContactID],[ID]
)
)
This is generally a better-performing pattern for "child tables" as the clustered index also supports the foreign key.
-
This seems like a better pattern. For the moment, I ignored that I cannot make this change in production, because of a framework we use. I dropped my old pk and added the one you suggested. It still tried to use my ContactID_Type index. I dropped that index and then it still scanned the PK. Link to the PlanJoshua Grippo– Joshua Grippo2021年05月21日 22:20:10 +00:00Commented May 21, 2021 at 22:20
How many won claim do you have (as a percentage of the total claims)? If the percentage is lower than 1% (or if you have fewer than a thousand claims) maybe it would make sense to perform a Nested Loops join and index seeks in ContactID_Type for each contact. Else, hash joins (or merge joins) would probably be better (because using index seeks to read a large portion of the tContactPhone table would be less efficient than reading the entire table with a scan).
If a hash join is used with the ContactID_Type index, the Type column cannot be used for a seek. To use a seek, it needs an index which has the Type column as the first key column. So that's why the optimizer suggests the index on the Type column (because it is hoping to read less rows from that index).
-
I think on average we are going to have 1-10% of the claims be won, but I cannot guarantee this. For the example data, I just forced it to be 4% of claims based on a recent sample. I am not sure how to actually force the engine to choose a hash join vs merge join etc. Is that a thing I can do? Just for giggles, I selected my 4k claims into a temp table and then joined that to tContactPhones and got the same result as above. I then added an index on Type_ContactID and it still gave me the 100k rows read, which was not helpful, but at least it tried to seek this time.Joshua Grippo– Joshua Grippo2021年05月21日 22:28:00 +00:00Commented May 21, 2021 at 22:28
-
You can force a particular join type by specifying it in the JOIN clause (for example INNER LOOP JOIN) or on the entire query, adding OPTION (LOOP JOIN) at the end. The last one would use this join type for all the joins in the query, but the first way also forces the join order, so use them carefully.Razvan Socol– Razvan Socol2021年05月22日 03:49:20 +00:00Commented May 22, 2021 at 3:49
-
I didn't know you could add a hint to a join to help shape the plan. While I appreciate the suggestion, it is not really what I am looking for. Similar to not use [options], I don't want to use [hints] either. I want to figure out why the engine is not making the plan that I think the majority of people would expect it to make and/or how to index properly so that it does make a reasonable plan.Joshua Grippo– Joshua Grippo2021年05月23日 18:44:08 +00:00Commented May 23, 2021 at 18:44
Explore related questions
See similar questions with these tags.
Type
andContactID
columns around in your index creation script for yourtContactPhone
table? I.e. your index creation script as this:create index ContactID_Type on tContactPhone(Type, ContactID) include (Number)
. (Make sure to drop the old index too.)create index Status on tClaim(Status, ContactID) include (Name); create index ContactID_Type on tContactPhone(Type, ContactID) include (Number)
it should result in a merge join