Why is Clustered Index Scan Number of Executions so high?

Question 1

I have two similar queries that generate the same query plan, except that one query plan executes a Clustered Index Scan 1316 times, while the other executes it 1 time.

The only difference between the two queries is different date criteria. The long running query actually narrower date criteria, and pulls back less data.

I have identified some indexes that will help with both queries, but I just want to understand why the Clustered Index Scan operator is executing 1316 times on a query that is virtually the same as the one that where it executes 1 time.

I checked the statistics on the PK that is being scanned, and they are relatively up to date.

Original query:

select distinct FIR_Incident.IncidentID
from FIR_Incident
left join (
 select incident_id as exported_incident_id
 from postnfirssummary
) exported_incidents on exported_incidents.exported_incident_id = fir_incident.incidentid
where FI_IncidentDate between '2011-06-01 00:00:00.000' and '2011-07-01 00:00:00.000'
 and exported_incidents.exported_incident_id is not null

Generates this plan: enter image description here

After narrowing the date range criteria:

select distinct FIR_Incident.IncidentID
from FIR_Incident
left join (
 select incident_id as exported_incident_id
 from postnfirssummary
) exported_incidents on exported_incidents.exported_incident_id = fir_incident.incidentid
where FI_IncidentDate between '2011-07-01 00:00:00.000' and '2011-07-02 00:00:00.000'
 and exported_incidents.exported_incident_id is not null

Generates this plan: enter image description here

Question 2

Could you copy / paste your queries in a code block instead of image files?

Question 3

Sure - I added the queries that are generating each plan.

Question 4

Which table is the clustered index scan occurring on?

Question 5

Clustered Index scan is on the subquery in the left join (PostNFIRSSummary)

Question 6

Presumably last time the statistics were updated there was only zero or one row meeting the FI_IncidentDate between '2011年07月01日 00:00:00.000' and '2011年07月02日 00:00:00.000' criteria and since then there have been a disproportionate number of inserts in that range. It estimates only 1.07 executions will be needed for that date range. Not the 1,316 that ensue in actuality.

Question 7

The JOIN after the scan gives a clue: with less rows on one side of the last join (reading right to left of course) the optimiser chooses a "nested loop" not a "hash join".

However, before looking at this I'd aim to eliminate the Key Lookup and the DISTINCT.

Key lookup: your index on FIR_Incident should be covering, probably (FI_IncidentDate, incidentid) or the other way around. Or have both and see which is used more often (they both may be)
The DISTINCT is a consequence of the LEFT JOIN ... IS NOT NULL. The optimiser has already removed it (the plans have "left semi joins" on the final JOIN) but I'd use EXISTS for clarity

Something like:

select 
 F.IncidentID
from 
 FIR_Incident F
where 
 exists (SELECT * FROM postnfirssummary P
 WHERE P.incident_id = F.incidentid)
 AND
 F.FI_IncidentDate between '2011-07-01 00:00:00.000' and '2011-07-02 00:00:00.000'

You can also use plan guides and JOIN hints to make SQL Server use a hash join, but try to make it work normally first: a guide or a hint probably won't stand the test of time because they are only useful for the data and queries you run now, not in the future

gbn gbn 70.3k8 gold badges168 silver badges244 bronze badges · Answer 1 · 2011-12-06 19:52:15Z

The JOIN after the scan gives a clue: with less rows on one side of the last join (reading right to left of course) the optimiser chooses a "nested loop" not a "hash join".

However, before looking at this I'd aim to eliminate the Key Lookup and the DISTINCT.

Key lookup: your index on FIR_Incident should be covering, probably (FI_IncidentDate, incidentid) or the other way around. Or have both and see which is used more often (they both may be)
The DISTINCT is a consequence of the LEFT JOIN ... IS NOT NULL. The optimiser has already removed it (the plans have "left semi joins" on the final JOIN) but I'd use EXISTS for clarity

Something like:

select 
 F.IncidentID
from 
 FIR_Incident F
where 
 exists (SELECT * FROM postnfirssummary P
 WHERE P.incident_id = F.incidentid)
 AND
 F.FI_IncidentDate between '2011-07-01 00:00:00.000' and '2011-07-02 00:00:00.000'

You can also use plan guides and JOIN hints to make SQL Server use a hash join, but try to make it work normally first: a guide or a hint probably won't stand the test of time because they are only useful for the data and queries you run now, not in the future

Stack Exchange Network

Why is Clustered Index Scan Number of Executions so high?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Why is Clustered Index Scan Number of Executions so high?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions