How does the T-SQL optimizer handle "and false"

Question 1

In SQL SERVER 2008 I'm thinking about a query such as:

SELECT ...
FROM ...
WHERE ...
 AND @a=1
UNION
SELECT ...
FROM ...
WHERE ...
 AND @a=2

Given that @a is clearly the same value, and therefore either the first SELECT returns results or the second SELECT does, will the query optimizer identify one of the queries as false and not run it? Would it be better for me to put them in an IF statement? The actual query is a bit more complicated than written of course, so the if would make it like 4 different queries, which would be a pain to maintain.

I'm hoping the optimizer is smart enough to see the 'and false' and not waste time with it.

Question 2

Look at the actual execution plans. See if there is a filter with a startup predicate and "number of executions" is zero for the operators on the part of the plan that should be excluded.

Question 3

The only way to know for sure is to run it and see. There is no guarantee of a short circuit. A lot of times the optimizer will run the WHERE criteria in parallel to make it faster, which in your case would be slower.

Question 4

I would worry more about using UNION instead of UNION ALL. If the where conditions would result in mutually exclusive results sets (which it appears they would), UNION ALL would likely result in a faster execution plan becasue it wouldn't be planning to check for duplicates.

Question 5

If your looking to test this for yourself why don't you just split the query apart at the union and run the optimizer against both portions and compare it to the results you get from running the whole query? Doing little tests like this are a great way to see how things will actually run as opposed to how the theoretically do and gives you a greater understanding of how your database works.

Question 6

With respect to your question

will the query optimizer identify one of the queries as false and not run it?

You'd need to check the execution plans. An example below where it doesn't.

DECLARE @P INT =1
SET STATISTICS IO ON;
SELECT name
FROM sys.objects
WHERE type='P' AND @P=1
UNION 
SELECT name
FROM master..spt_values
WHERE number >0 AND @P=0

Output

Table 'Worktable'. Scan count 0, logical reads 0
Table 'sysschobjs'. Scan count 1, logical reads 5

Notice that master..spt_values was not touched in the query execution. The reason for this is that the seek on this table is under a filter with a start up expression predicate of [@P]=(0). As this condition is not met the seek is never actually executed.

Both the Startup Expression and number of executions are shown in the operator tool tips as below.

Plan

With regard to the other part of your question

Would it be better for me to put them in an IF statement?

Yes probably despite that. You're then guaranteed that the behaviour that you get is that which you want. Also even though master..spt_values doesn't have any pages read above it does still have an IS lock taken out against it (which doesn't happen in the IF ... ELSE version)

Better still might be to use an If that calls separate stored procedures for each branch. You haven't shown the rest of your code but one possible issue with these mutually exclusive branches might be parameter sniffing. The batch will be compiled as per the first set of parameters seen. These might be wholly unsuitable for the other branch(es). Splitting into child batches allows each branch to be compiled separately taking into account the parameter values that actually occurred when that branch was executed.

Question 7

However, the Distinct Sort step might be unnecessary, right? We select from at most one object, and name is presumably unique in each of them, is it not.

Question 8

@AlexKuznetsov - Agreed. They probably need UNION ALL and even if they don't and the DISTINCT semantics are actually desired they might get a better plan having DISTINCT in the two individual queries.

Question 9

SQL Server optimizer is closed source, so we cannot see what exactly is going on under the hood. Of course, there are tons of documentation, and just as surely there are lots of cases when this thing does not behave exactly as documented.

In my experience, all documentation from all vendors tends to get out of sync quite soon, sometimes before the first release ;). The only 100% accurate source of information is source code, pun intended.

So, even if you observe some behavior right now, there is no guarantee it cannot change later on. The reason: we have no idea what is involved, how the optimizer is making its decisions. So, we do not know if we have tested all the situations.

As such, I would keep it simple:

IF @a=1 BEGIN ;
 SELECT ... FROM ... ;
END ELSE BEGIN ;
 SELECT ... FROM ... ;
END ;

To reuse code and reduce maintenance costs, I would have both selects utilize an inline UDF to keep common functionality in one place.

Besides, as demonstrated in Martin Smith's excellent answer, UNION leads to a possibly unnecessary sort, which may be very expensive.

In general, it pays to write shorter, more specific queries - you have a better chance to get a good execution plan.

Question 10

The query optimizer must come up with an execution plan that works for any value of @a. This plan will be executed when @a=1 and the very same execution plan will be executed when @a=2. Therefore by definition the answer to your question is NO: all queries in the UNION will be executed. There is simply no other way.

Question 11

Just because it is in the plan doesn't mean it is executed though.

Question 12

@Martin: that is splitting hairs. While true that there are some plan portions than may not be executed at runtime (eg. dynamic partition elimination, a nested loop that never probes, a TOP satisfied before jumping into an UNION etc) that is really the exception. Both as a way to explain how it works and for the vast majority of real use cases the plan is always executed entirely.

Question 13

For the case given in the OP a filter with startup predicate seems quite common though.

Question 14

@Martin: is a very slippery slope. Explicit IF branches are million times safer. And ultimately this question is the hallmark of wrong API design, the one jack-of-all-trades procedure that can return sprockets or widgets depending on what argument is called with.

Question 15

Not to mention that the OP has an assumption about boolean short circuit, which is incorrect: rusanu.com/2009/09/13/…

Martin Smith Martin Smith 88.4k15 gold badges258 silver badges357 bronze badges · Accepted Answer · 2012-10-05 18:13:24Z

With respect to your question

will the query optimizer identify one of the queries as false and not run it?

You'd need to check the execution plans. An example below where it doesn't.

DECLARE @P INT =1
SET STATISTICS IO ON;
SELECT name
FROM sys.objects
WHERE type='P' AND @P=1
UNION 
SELECT name
FROM master..spt_values
WHERE number >0 AND @P=0

Output

Table 'Worktable'. Scan count 0, logical reads 0
Table 'sysschobjs'. Scan count 1, logical reads 5

Notice that master..spt_values was not touched in the query execution. The reason for this is that the seek on this table is under a filter with a start up expression predicate of [@P]=(0). As this condition is not met the seek is never actually executed.

Both the Startup Expression and number of executions are shown in the operator tool tips as below.

Plan

With regard to the other part of your question

Would it be better for me to put them in an IF statement?

Yes probably despite that. You're then guaranteed that the behaviour that you get is that which you want. Also even though master..spt_values doesn't have any pages read above it does still have an IS lock taken out against it (which doesn't happen in the IF ... ELSE version)

Better still might be to use an If that calls separate stored procedures for each branch. You haven't shown the rest of your code but one possible issue with these mutually exclusive branches might be parameter sniffing. The batch will be compiled as per the first set of parameters seen. These might be wholly unsuitable for the other branch(es). Splitting into child batches allows each branch to be compiled separately taking into account the parameter values that actually occurred when that branch was executed.

However, the Distinct Sort step might be unnecessary, right? We select from at most one object, and name is presumably unique in each of them, is it not.
@AlexKuznetsov - Agreed. They probably need UNION ALL and even if they don't and the DISTINCT semantics are actually desired they might get a better plan having DISTINCT in the two individual queries.

Stack Exchange Network

How does the T-SQL optimizer handle "and false"

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

How does the T-SQL optimizer handle "and false"

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions