Same query way faster with distinct than without distinct

Question 1

I have a query that runs very fast with distinct than without distinct. when running both queries, the one with distinct takes 13% execution time, the one without distinct takes 87% execution time.

These are TSQL query in SQL Server 2014 on a database with 110 compatibility mode.

Strange thing is that the query with distinct is using seeks while the query without distinct is using 1 scan. Both query are identical and have the same where clause.

Can you help me understand why the query with the distinct is faster and why the query without distinct is not using a seek ?

Query plans : www.brentozar.com/pastetheplan/?id=ryr18jJsb

Queries:

select 
 SMS_R_SYSTEM.ItemKey
 ,SMS_R_SYSTEM.Name0,SMS_R_SYSTEM.SMS_Unique_Identifier0
 ,SMS_R_SYSTEM.Resource_Domain_OR_Workgr0
 ,SMS_R_SYSTEM.Client0
from vSMS_R_System as SMS_R_SYSTEM
 inner join Add_Remove_Programs_DATA 
 on Add_Remove_Programs_DATA.MachineID = SMS_R_System.ItemKey
 inner join Add_Remove_Programs_64_DATA 
 on Add_Remove_Programs_64_DATA.MachineID = SMS_R_System.ItemKey
where Add_Remove_Programs_DATA.DisplayName00 = 'aze'
 or Add_Remove_Programs_64_DATA.DisplayName00 = 'aze'
;
select distinct 
 SMS_R_SYSTEM.ItemKey
 ,SMS_R_SYSTEM.Name0,SMS_R_SYSTEM.SMS_Unique_Identifier0
 ,SMS_R_SYSTEM.Resource_Domain_OR_Workgr0
 ,SMS_R_SYSTEM.Client0
from vSMS_R_System as SMS_R_SYSTEM
 inner join Add_Remove_Programs_DATA 
 on Add_Remove_Programs_DATA.MachineID = SMS_R_System.ItemKey
 inner join Add_Remove_Programs_64_DATA 
 on Add_Remove_Programs_64_DATA.MachineID = SMS_R_System.ItemKey
where Add_Remove_Programs_DATA.DisplayName00 = 'aze'
 or Add_Remove_Programs_64_DATA.DisplayName00 = 'aze' 
;

Question 2

Assuming the second plan is significantly more efficient when actually run and the results are identical, I'm not sure why the planner is not seeing the optimisation without DISTINCT being present. It may help to know what keys and indexes are defined on those tables so I suggest editing that information into the question. Though as the first plan is scanning ~400K and ~4M rows from the Add_Remove_* tables, if DisplayName00 = 'aze' is particularly selective it could help that plan massively to have an index on each of those columns, perhaps INCLUDEing MachineID'.

Question 3

Thank you, David, for your inputs. Sorry, I am just reacting now because I was busy on a mission. Just like you, I am also wondering why the query without distinct is not seeking the index. I am a part time employee at the customer with this behavior. And the customer asks for many tasks. If I have a chance, I will try to provide keys and indexes definitions on those tables.

Question 4

have you go up to date statistics?

Question 5

From the way I read your execution plans, your DISTINCT query is selecting columns only from vSMS_R_System, so the optimizer chooses to read that data first (retrieving the DISTINCT values used in your SELECT) and then uses a NESTED LOOP to join up against the other tables.

The non-DISTINCT query forces the optimizer to assume that many more rows might need to be read from vSMS_R_System, so it chooses a MERGE JOIN to scan the other tables and then a NESTED LOOP to match up against vSMS_R_System.

Question 6

I was expecting the optimizer to seek the index for the non-distinct query. This is part that is confusing me since there is an obvious index to use. the distinct query is using it.

score 2 · Answer 1 · 2017-09-20 11:18:00Z

From the way I read your execution plans, your DISTINCT query is selecting columns only from vSMS_R_System, so the optimizer chooses to read that data first (retrieving the DISTINCT values used in your SELECT) and then uses a NESTED LOOP to join up against the other tables.

The non-DISTINCT query forces the optimizer to assume that many more rows might need to be read from vSMS_R_System, so it chooses a MERGE JOIN to scan the other tables and then a NESTED LOOP to match up against vSMS_R_System.

I was expecting the optimizer to seek the index for the non-distinct query. This is part that is confusing me since there is an obvious index to use. the distinct query is using it.

Stack Exchange Network

Same query way faster with distinct than without distinct

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Same query way faster with distinct than without distinct

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions