I have two similar queries that generate the same query plan, except that one query plan executes a Clustered Index Scan 1316 times, while the other executes it 1 time.
The only difference between the two queries is different date criteria. The long running query actually narrower date criteria, and pulls back less data.
I have identified some indexes that will help with both queries, but I just want to understand why the Clustered Index Scan operator is executing 1316 times on a query that is virtually the same as the one that where it executes 1 time.
I checked the statistics on the PK that is being scanned, and they are relatively up to date.
Original query:
select distinct FIR_Incident.IncidentID
from FIR_Incident
left join (
select incident_id as exported_incident_id
from postnfirssummary
) exported_incidents on exported_incidents.exported_incident_id = fir_incident.incidentid
where FI_IncidentDate between '2011-06-01 00:00:00.000' and '2011-07-01 00:00:00.000'
and exported_incidents.exported_incident_id is not null
Generates this plan: enter image description here
After narrowing the date range criteria:
select distinct FIR_Incident.IncidentID
from FIR_Incident
left join (
select incident_id as exported_incident_id
from postnfirssummary
) exported_incidents on exported_incidents.exported_incident_id = fir_incident.incidentid
where FI_IncidentDate between '2011-07-01 00:00:00.000' and '2011-07-02 00:00:00.000'
and exported_incidents.exported_incident_id is not null
Generates this plan: enter image description here
1 Answer 1
The JOIN after the scan gives a clue: with less rows on one side of the last join (reading right to left of course) the optimiser chooses a "nested loop" not a "hash join".
However, before looking at this I'd aim to eliminate the Key Lookup and the DISTINCT.
Key lookup: your index on FIR_Incident should be covering, probably
(FI_IncidentDate, incidentid)
or the other way around. Or have both and see which is used more often (they both may be)The
DISTINCT
is a consequence of theLEFT JOIN ... IS NOT NULL
. The optimiser has already removed it (the plans have "left semi joins" on the final JOIN) but I'd use EXISTS for clarity
Something like:
select
F.IncidentID
from
FIR_Incident F
where
exists (SELECT * FROM postnfirssummary P
WHERE P.incident_id = F.incidentid)
AND
F.FI_IncidentDate between '2011-07-01 00:00:00.000' and '2011-07-02 00:00:00.000'
You can also use plan guides and JOIN hints to make SQL Server use a hash join, but try to make it work normally first: a guide or a hint probably won't stand the test of time because they are only useful for the data and queries you run now, not in the future
FI_IncidentDate between '2011年07月01日 00:00:00.000' and '2011年07月02日 00:00:00.000'
criteria and since then there have been a disproportionate number of inserts in that range. It estimates only 1.07 executions will be needed for that date range. Not the 1,316 that ensue in actuality.