Creating efficient indexes on SQL Server with a big WHERE clause

Question 1

I have a database that contains two tables, containing these columns (some irrelevant columns have been omitted):

Containers table

Id nvarchar(20) NOT NULL primary key, value generated before insert
Category nvarchar(5) NOT NULL A category with around 5 possible values
Status int NOT NULL Value between 0 and 3.

Items table

Id uniqueidentifier NOT NULL primary key
ContainerId nvarchar(20) NOT NULL FK to the Containers table. Containers contain 1-n Items
PartId uniqueidentifier NULL A part ID OR a part category must be provided. They are mutually exclusive. There are ~15000 unique part ids
PartCategory nvarchar(50) NULL ~1000 unique part categories
CustomerId uniqueidentifier NULL A customer ID, category or area must be provided. They are mutually exclusive. There are ~2000 unique customer ids
CustomerCategory nvarchar(10) NULL ~50 unique values
CustomerArea nvarchar(10) NULL ~30 unique values
StartDate date NOT NULL
EndDate date NOT NULL
MinQuantity int NOT NULL
MaxQuantity int NULL

Currently my Containers table contains 9000 rows, and my Items table contains 800000 rows.

I now need to create a query that retrieves all Items matching a set of criterias : the container category, the container status (I always need those with a value of 3), the items part (through its ID or category), the items customer (through its ID, category or area), with a valid date (current date must be between StartDate and EndDate), and a valid quantity (a provided quantity must be between MinQuantity and MaxQuantity if not null).

I naively wrote the following query:

select Items.ColumnA, Items.ColumnB -- actually I project 13 columns here
from Items
inner join Containers on Items.ContainerId=Containers.Id
where
Containers.Category = 'Category1' and
Containers.Status = 3 and -- always 3 (not parametrized)
(Items.PartId = 'some guid' or Items.PartCategory = 'PartCategory') and
(Items.CustomerId = 'some other guid' or Items.CustomerCategory = 'CustCategory' or Items.CustomerArea='area') and
Items.StartDate <= 'some date' and Items.EndDate >= 'some date' and
Items.MinQuantity <= 10 and (Items.MaxQuantity is null or Items.MaxQuantity >= 10)

This works and runs in around 200ms on our staging server. Not great, but that was acceptable for single queries.

Now, my problem is that we want to prepare catalogs to send to every customer. This means calling this query for hundreds of customers multiplied by around 70000 parts. And that, of course, takes a very long time. We have tried creating an index suggested by Management Studio in the Live Query Statistics, but while it did bring some improvement, we are still trying to improve.

Question :

Are there any obvious pointers about what can be done to optimize this specific query? I have a developer background and I'm feeling overwhelmed by all the different indexing possibilities.
I'm also trying to think of a way to group all my queries into one call to the database, but even if I use things like a stored procedure to loop on inputs and execute my query 1000 times, I fear that the gain will also be small as the database will still have to seek on large indexes, which will still take the most time. Is there any stragegy to optimize grouping calls to this query?

Addendum: The index that we created after Management Studio's suggestion looks like this:

CREATE NONCLUSTERED INDEX [IX_SomeIndex] ON Items (
 MinQuantity ASC,
 MaxQuantity ASC,
 StartDate ASC,
 EndDate ASC
)
INCLUDE PartId, PartCategory, CustomerId, CustomerCategory, CustomerArea, ContainerId, Column1, Column2

Question 2

How To Get Answers To SQL Server Performance Questions

Question 3

Try to use UNION

Question 4

There are multiple things to address in your post that could benefit you. But the easiest way to provide you the most applicable changes specifically for your root issues directly would require you to provide more information. Most importantly the actual execution plan for your slow query, which you can upload to Paste The Plan and then link in your post. Knowing what other indexes already exist on these two tables would be helpful too.

In the interim, here's some generic feedback based on what you've provided so far:

Overly complex predicates (WHERE and ON clauses) can hit limitations of the query optimizer. Mixing multiple compound OR statements in a predicate can quickly hit that limitation. Sometimes a solution to this is to refactor the unique parts of the predicate into separate queries whose results are unioned back together (eliminating the ORs). This will allow the optimizer to plan more efficiently for each separate query and then combine the results so it's logically equivalent to your original single query.

An example of that for your query could be:

-- Combo 1: PartId & CustomerId Filtering
select Items.ColumnA, Items.ColumnB
from Items
inner join Containers on Items.ContainerId=Containers.Id
where
Containers.Category = 'Category1' and
Containers.Status = 3 and -- always 3 (not parametrized)
Items.PartId = 'some guid'
and Items.CustomerId = 'some other guid' 
and Items.StartDate <= 'some date' and Items.EndDate >= 'some date' and
Items.MinQuantity <= 10 and (Items.MaxQuantity is null or Items.MaxQuantity >= 10)
UNION ALL
-- Combo 2: PartId & CustomerCategory Filter
select Items.ColumnA, Items.ColumnB
from Items
inner join Containers on Items.ContainerId=Containers.Id
where
Containers.Category = 'Category1' and
Containers.Status = 3 and -- always 3 (not parametrized)
Items.PartId = 'some guid'
and Items.CustomerCategory = 'CustCategory' 
and Items.StartDate <= 'some date' and Items.EndDate >= 'some date' and
Items.MinQuantity <= 10 and (Items.MaxQuantity is null or Items.MaxQuantity >= 10)
UNION ALL
-- Combo 3: PartId & CustomerArea Filtering
select Items.ColumnA, Items.ColumnB
from Items
inner join Containers on Items.ContainerId=Containers.Id
where
Containers.Category = 'Category1' and
Containers.Status = 3 and -- always 3 (not parametrized)
Items.PartId = 'some guid'
AND Items.CustomerArea='area'
and Items.CustomerId = 'some other guid' 
and Items.StartDate <= 'some date' and Items.EndDate >= 'some date' and
Items.MinQuantity <= 10 and (Items.MaxQuantity is null or Items.MaxQuantity >= 10)
UNION ALL
-- Combo 4: PartCategory & CustomerId Filtering
select Items.ColumnA, Items.ColumnB
from Items
inner join Containers on Items.ContainerId=Containers.Id
where
Containers.Category = 'Category1' and
Containers.Status = 3 and -- always 3 (not parametrized)
Items.PartCategory = 'PartCategory'
and Items.CustomerId = 'some other guid' 
and Items.StartDate <= 'some date' and Items.EndDate >= 'some date' and
Items.MinQuantity <= 10 and (Items.MaxQuantity is null or Items.MaxQuantity >= 10)
UNION ALL
-- Combo 5: PartCategory & CustomerCategory Filter
select Items.ColumnA, Items.ColumnB
from Items
inner join Containers on Items.ContainerId=Containers.Id
where
Containers.Category = 'Category1' and
Containers.Status = 3 and -- always 3 (not parametrized)
Items.PartCategory = 'PartCategory'
and Items.CustomerCategory = 'CustCategory' 
and Items.StartDate <= 'some date' and Items.EndDate >= 'some date' and
Items.MinQuantity <= 10 and (Items.MaxQuantity is null or Items.MaxQuantity >= 10)
UNION ALL
-- Combo 6: PartCategory & CustomerArea Filtering
select Items.ColumnA, Items.ColumnB
from Items
inner join Containers on Items.ContainerId=Containers.Id
where
Containers.Category = 'Category1' and
Containers.Status = 3 and -- always 3 (not parametrized)
Items.PartCategory = 'PartCategory'
AND Items.CustomerArea='area'
and Items.CustomerId = 'some other guid' 
and Items.StartDate <= 'some date' and Items.EndDate >= 'some date' and
Items.MinQuantity <= 10 and (Items.MaxQuantity is null or Items.MaxQuantity >= 10)

We can use UNION ALL as opposed to UNION here to combine the refactored parts of the query because per the information you provided, each of these instances should be mutually exclusive from each other. This is advantageous since UNION ALL typically can be a little more performant not having to do an extra de-duplication step.

We can even take the above one step further and leverage branching to have the optimizer cater a separate execution plan for each mutually exclusive part. The above query, while easier on the optimizer to plan for than your original single query, will still only produce 1 single execution plan that incorporates all mutually exclusive parts of the query. The unnecessary parts of the query will still be executed when not needed, and if you do ultimately put this into a stored procedure, you can potentially run into parameter sniffing performance issues. One way to solve this is with branching which allows a separate execution plan to be generated and catered to each mutually exclusive part of the above query.

You can either manage the branching manually by using your consuming application's code to call only the relevant mutually exclusive part of the query above, based on your parameters, as needed instead of unioning them all together. You may want to consider saving each part to a separate database object like a stored procedure for each part, if you go that route. Or you can write the branching logic code in SQL, within a single stored procedure which then calls each separate part via Dynamic SQL. Each Dynamic SQL statement will get its own separate execution plan.

An example of what that would look like:

CREATE PROCEDURE YourSchema.AGoodNameForThisStoredProcedure
 @ContainerCategory NVARCHAR(5),
 @PartId UNIQUEIDENTIFIER, 
 @PartCategory NVARCHAR(5), 
 @CustomerId UNIQUEIDENTIFIER,
 @CustomerCategory NVARCHAR(10),
 @CustomerArea NVARCHAR(10),
 @StartDate DATE,
 @EndDate DATE
AS
BEGIN
 -- Globals
 DECLARE @DynamicSQLStatement NVARCHAR(MAX);
 -- Build the Dynamic SQL statement via the appropriate branch
 IF (@PartId IS NOT NULL)
 BEGIN
 IF (@CustomerId IS NOT NULL)
 BEGIN
 -- Combo 1: PartId & CustomerId Filtering
 SET @DynamicSQLStatement =
 N'
 select Items.ColumnA, Items.ColumnB
 from Items
 inner join Containers on Items.ContainerId=Containers.Id
 where
 Containers.Category = @ContainerCategory and
 Containers.Status = 3 and -- always 3 (not parametrized)
 Items.PartId = @PartId
 and Items.CustomerId = @CustomerId
 and Items.StartDate <= 'some date' and Items.EndDate >= 'some date' and
 Items.MinQuantity <= 10 and (Items.MaxQuantity is null or Items.MaxQuantity >= 10)
 '
 END
 ELSE IF (@CustomerCategory IS NOT NULL)
 BEGIN
 -- ...Combo 2 mutually exclusive case
 END
 ELSE
 BEGIN -- @CustomerArea should not be null here
 -- ...Combo 3 mutually exclusive case
 END
 END
 ELSE -- @PartCategory should not be null here
 BEGIN
 IF (@CustomerId IS NOT NULL)
 BEGIN
 -- Combo 4: PartCategory & CustomerId Filtering
 SET @DynamicSQLStatement =
 N'
 select Items.ColumnA, Items.ColumnB
 from Items
 inner join Containers on Items.ContainerId=Containers.Id
 where
 Containers.Category = @ContainerCategory and
 Containers.Status = 3 and -- always 3 (not parametrized)
 Items.PartCategory = @PartCategory
 and Items.CustomerId = @CustomerId
 and Items.StartDate <= 'some date' and Items.EndDate >= 'some date' and
 Items.MinQuantity <= 10 and (Items.MaxQuantity is null or Items.MaxQuantity >= 10)
 '
 END
 ELSE IF (@CustomerCategory IS NOT NULL)
 BEGIN
 -- ...Combo 5 mutually exclusive case
 END
 ELSE
 BEGIN -- @CustomerArea should not be null here
 -- ...Combo 6 mutually exclusive case
 END
 END
 -- Execute the Dynamic SQL statement
 EXEC sp_executesql
 -- Statement
 @DynamicSQLStatement,
 -- Argument signature 
 N'
 @ContainerCategory NVARCHAR(5),
 @PartId UNIQUEIDENTIFIER, 
 @PartCategory NVARCHAR(5), 
 @CustomerId UNIQUEIDENTIFIER,
 @CustomerCategory NVARCHAR(10),
 @CustomerArea NVARCHAR(10),
 @StartDate DATE,
 @EndDate DATE
 ',
 -- Parameters
 @ContainerCategory,
 @PartId, 
 @PartCategory, 
 @CustomerId,
 @CustomerCategory,
 @CustomerArea,
 @StartDate,
 @EndDate
END

Branching can get verbose quickly, as seen above with the code of only two of the six cases actually filled in. So I recommend still putting each code branch in its own stored procedure, and then calling that stored procedure via the Dynamic SQL instead. This improves readability and maintainability.

Static filters can be efficiently predicated on via a Filtered Index or Indexed View by reducing the search space and persisting only a subset of the data that is actually needed to be queried. Since Containers.Status = 3 is not parameterized and a constant value being filtered on, we can easily define our index to be filtered on this value like so:

CREATE NONCLUSTERED INDEX IX_Containers_Filtered_StatusEquals3 ON Containers (Id, Category) WHERE Status = 3

In all honesty though, because of how small your Containers table is (both height and width-wise), you'll probably not see much gain implementing this kind of index here in this specific case. But it's a great tool to be aware of in general, especially for larger tables whose majority of records would be filtered out by the filter expression in the index.

Proper indexing in general can go a long way though. With the re-written query above that breaks it up into mutually exclusive parts or even for the branched version, the following couple of indexes on the predicates of each iteration may prove to be of additional help:

CREATE NONCLUSTERED INDEX IX_Items_Combo1 ON Items (PartId, CustomerId, StartDate, EndDate, MinQuantity, MaxQuantity);
CREATE NONCLUSTERED INDEX IX_Items_Combo2 ON Items (PartId, CustomerCategory, StartDate, EndDate, MinQuantity, MaxQuantity);
CREATE NONCLUSTERED INDEX IX_Items_Combo3 ON Items (PartId, CustomerArea, StartDate, EndDate, MinQuantity, MaxQuantity);
CREATE NONCLUSTERED INDEX IX_Items_Combo4 ON Items (PartCategory, CustomerId, StartDate, EndDate, MinQuantity, MaxQuantity);
CREATE NONCLUSTERED INDEX IX_Items_Combo5 ON Items (PartCategory, CustomerCategory, StartDate, EndDate, MinQuantity, MaxQuantity);
CREATE NONCLUSTERED INDEX IX_Items_Combo6 ON Items (PartCategory, CustomerArea, StartDate, EndDate, MinQuantity, MaxQuantity);

Indexing can be a little tricky because there is write overhead against your base tables whenever data is inserted, updated, or deleted, for each index against those tables. And the width (amount of columns) indexed increases that overhead. I typically aim for a 5x5 guideline, which is try to roughly not add more than 5 indexes per table, and try to not add more than 5 columns per index. But these are very loose guidelines, and exceeding them is quite ok. Your mileage will really just vary depending on the busyness of your system, especially in regards to writes vs reads of the data. So it really just comes down to a little bit of trial and error with a lot of careful testing.

That being said, while the above caters a specific index to each branch of the above query, we can potentially still cater to all of the branches with half the amount of indexes, by removing one column from each of them (the PartId or PartCategory) and they'll still potentially be applicable to your query. E.g. we can try implementing only 3 indexes like so:

CREATE NONCLUSTERED INDEX IX_Items_Combo1And4 ON Items (CustomerId, StartDate, EndDate, MinQuantity, MaxQuantity);
CREATE NONCLUSTERED INDEX IX_Items_Combo2And5 ON Items (CustomerCategory, StartDate, EndDate, MinQuantity, MaxQuantity);
CREATE NONCLUSTERED INDEX IX_Items_Combo3And6 ON Items (CustomerArea, StartDate, EndDate, MinQuantity, MaxQuantity);

Since all of these fields are part of the predicates for each of the queries these indexes are written for, they are still applicable. This cuts our write overhead in half while still potentially improving the read performance of the above query. But you'll need to test and review the execution plan to ensure that these indexes are actually used in a sargable manner. It's possible, depending on your data and its statistics, that one set of indexes might not actually get used by the query optimizer, even if those indexes are well defined for the fields being predicated on. You may find you actually do need to use the aforementioned six indexes instead, for example.

Taking indexing one step further, you can use the INCLUDE clause to include the columns you're projecting, so that when the index is used, an additional key lookup operation is not needed to find those projected fields in your table. Instead, those projected columns will already be persisted with the index that was used to serve your predicate.

An example of what an index looks like with those projected columns included would be:

CREATE NONCLUSTERED INDEX IX_Items_Combo1And4 ON Items (CustomerId, StartDate, EndDate, MinQuantity, MaxQuantity) INCLUDE (ColumnA, ColumnB);

So this also gives you additional index tuning options to consider and test with.

Normalizing your tables may force you to write better queries from the get-go. This is an in-depth topic I'm not going to cover, especially because I'm not an expert on it. But intuitively, it's a little odd sounding to have an Items table that is defined by a PartId OR a PartCategory. This table design caused you to introduce a composite OR in your original query. Instead, if you re-architected your table to be a Parts table and PartCategories table, and refactored out the common fields to their own table potentially, then you would find you would've naturally written separate queries for the different combinations of cases, already implicitly branching the code. Just some food for thought...
"even if I use things like a stored procedure to loop on inputs and execute my query 1000 times" - Yea, nah, don't do that please. SQL code operates most efficiently when operating on the data with set logic not iterative logic. So a stored procedure is a good tool, but don't immediately go reaching for a loop. Instead, think of a way to process all 1,000 instances of the input in a set-based manner. For example, if the 1,000 instances of input are different combinations of your parameters for the above query, you can input them into a temp table and join your temp table to each branch of the above query on those parameter-based columns. This will allow the query optimizer to plan an efficient query plan based on the statistics of the data that's going to be processed based on all of those parameter values, and ensures the query is only executed once as opposed to 1,000 times.
"I fear that the gain will also be small as the database will still have to seek on large indexes" - Maybe this is just a verbiage confusion thing, but with indexes there are two types of operations: Index Seeks and Index Scans. Index Seeks are extremely fast, regardless of how big the index is, when your query only needs a smaller subset of the entire table data. So no need to worry about the amount of data in your indexes, in regards to Index Seeks, when you're only querying a smaller subset of the data at a time.

The reason they're so efficient is because they are backed by a B-Tree data structure which has a search time complexity of O(log2(n)). This means if your index had 1 billion rows, in the worst case, an index seek against it would only need to search ~30 rows at most. log2(1 billion) = ~30. A graphing calculator could seek on such small amount of data in milliseconds, let alone a whole database server.

There's probably a lot more that can be addressed, and as initially stated, having the execution plan would allow us to provide more targeted advice to your root problem. In any case, best of luck!

Question 5

Thanks for this detailed answer. I'll be sure to read it in detail (and digest it) and provide some more info soon. One thing I'll say for now is about the exclusive values: they are exclusive in the data (you can only have one of CustomerId, CustomerCategory and CustomerAreaper row), but all three will always be filled in the query parameters. Because of this I think your second point (branching depending on the parameters values) won't apply. But I'll check out the rest of your suggestions and see how it goes!

Question 6

@Shtong No problem! Ah seems like you really mean to say those column combinations are unique. Branching can still be used if you can reasonably hard-code those values in each branch of the query or a combination of branching and the UNION strategy perhaps. But would be easier to say if I was in front of your data. Best of luck!

J.D. J.D. 41.1k12 gold badges63 silver badges145 bronze badges · Answer 1 · 2025-06-07 14:36:31Z

There are multiple things to address in your post that could benefit you. But the easiest way to provide you the most applicable changes specifically for your root issues directly would require you to provide more information. Most importantly the actual execution plan for your slow query, which you can upload to Paste The Plan and then link in your post. Knowing what other indexes already exist on these two tables would be helpful too.

In the interim, here's some generic feedback based on what you've provided so far:

Overly complex predicates (WHERE and ON clauses) can hit limitations of the query optimizer. Mixing multiple compound OR statements in a predicate can quickly hit that limitation. Sometimes a solution to this is to refactor the unique parts of the predicate into separate queries whose results are unioned back together (eliminating the ORs). This will allow the optimizer to plan more efficiently for each separate query and then combine the results so it's logically equivalent to your original single query.

An example of that for your query could be:

-- Combo 1: PartId & CustomerId Filtering
select Items.ColumnA, Items.ColumnB
from Items
inner join Containers on Items.ContainerId=Containers.Id
where
Containers.Category = 'Category1' and
Containers.Status = 3 and -- always 3 (not parametrized)
Items.PartId = 'some guid'
and Items.CustomerId = 'some other guid' 
and Items.StartDate <= 'some date' and Items.EndDate >= 'some date' and
Items.MinQuantity <= 10 and (Items.MaxQuantity is null or Items.MaxQuantity >= 10)
UNION ALL
-- Combo 2: PartId & CustomerCategory Filter
select Items.ColumnA, Items.ColumnB
from Items
inner join Containers on Items.ContainerId=Containers.Id
where
Containers.Category = 'Category1' and
Containers.Status = 3 and -- always 3 (not parametrized)
Items.PartId = 'some guid'
and Items.CustomerCategory = 'CustCategory' 
and Items.StartDate <= 'some date' and Items.EndDate >= 'some date' and
Items.MinQuantity <= 10 and (Items.MaxQuantity is null or Items.MaxQuantity >= 10)
UNION ALL
-- Combo 3: PartId & CustomerArea Filtering
select Items.ColumnA, Items.ColumnB
from Items
inner join Containers on Items.ContainerId=Containers.Id
where
Containers.Category = 'Category1' and
Containers.Status = 3 and -- always 3 (not parametrized)
Items.PartId = 'some guid'
AND Items.CustomerArea='area'
and Items.CustomerId = 'some other guid' 
and Items.StartDate <= 'some date' and Items.EndDate >= 'some date' and
Items.MinQuantity <= 10 and (Items.MaxQuantity is null or Items.MaxQuantity >= 10)
UNION ALL
-- Combo 4: PartCategory & CustomerId Filtering
select Items.ColumnA, Items.ColumnB
from Items
inner join Containers on Items.ContainerId=Containers.Id
where
Containers.Category = 'Category1' and
Containers.Status = 3 and -- always 3 (not parametrized)
Items.PartCategory = 'PartCategory'
and Items.CustomerId = 'some other guid' 
and Items.StartDate <= 'some date' and Items.EndDate >= 'some date' and
Items.MinQuantity <= 10 and (Items.MaxQuantity is null or Items.MaxQuantity >= 10)
UNION ALL
-- Combo 5: PartCategory & CustomerCategory Filter
select Items.ColumnA, Items.ColumnB
from Items
inner join Containers on Items.ContainerId=Containers.Id
where
Containers.Category = 'Category1' and
Containers.Status = 3 and -- always 3 (not parametrized)
Items.PartCategory = 'PartCategory'
and Items.CustomerCategory = 'CustCategory' 
and Items.StartDate <= 'some date' and Items.EndDate >= 'some date' and
Items.MinQuantity <= 10 and (Items.MaxQuantity is null or Items.MaxQuantity >= 10)
UNION ALL
-- Combo 6: PartCategory & CustomerArea Filtering
select Items.ColumnA, Items.ColumnB
from Items
inner join Containers on Items.ContainerId=Containers.Id
where
Containers.Category = 'Category1' and
Containers.Status = 3 and -- always 3 (not parametrized)
Items.PartCategory = 'PartCategory'
AND Items.CustomerArea='area'
and Items.CustomerId = 'some other guid' 
and Items.StartDate <= 'some date' and Items.EndDate >= 'some date' and
Items.MinQuantity <= 10 and (Items.MaxQuantity is null or Items.MaxQuantity >= 10)

We can use UNION ALL as opposed to UNION here to combine the refactored parts of the query because per the information you provided, each of these instances should be mutually exclusive from each other. This is advantageous since UNION ALL typically can be a little more performant not having to do an extra de-duplication step.

We can even take the above one step further and leverage branching to have the optimizer cater a separate execution plan for each mutually exclusive part. The above query, while easier on the optimizer to plan for than your original single query, will still only produce 1 single execution plan that incorporates all mutually exclusive parts of the query. The unnecessary parts of the query will still be executed when not needed, and if you do ultimately put this into a stored procedure, you can potentially run into parameter sniffing performance issues. One way to solve this is with branching which allows a separate execution plan to be generated and catered to each mutually exclusive part of the above query.

You can either manage the branching manually by using your consuming application's code to call only the relevant mutually exclusive part of the query above, based on your parameters, as needed instead of unioning them all together. You may want to consider saving each part to a separate database object like a stored procedure for each part, if you go that route. Or you can write the branching logic code in SQL, within a single stored procedure which then calls each separate part via Dynamic SQL. Each Dynamic SQL statement will get its own separate execution plan.

An example of what that would look like:

CREATE PROCEDURE YourSchema.AGoodNameForThisStoredProcedure
 @ContainerCategory NVARCHAR(5),
 @PartId UNIQUEIDENTIFIER, 
 @PartCategory NVARCHAR(5), 
 @CustomerId UNIQUEIDENTIFIER,
 @CustomerCategory NVARCHAR(10),
 @CustomerArea NVARCHAR(10),
 @StartDate DATE,
 @EndDate DATE
AS
BEGIN
 -- Globals
 DECLARE @DynamicSQLStatement NVARCHAR(MAX);
 -- Build the Dynamic SQL statement via the appropriate branch
 IF (@PartId IS NOT NULL)
 BEGIN
 IF (@CustomerId IS NOT NULL)
 BEGIN
 -- Combo 1: PartId & CustomerId Filtering
 SET @DynamicSQLStatement =
 N'
 select Items.ColumnA, Items.ColumnB
 from Items
 inner join Containers on Items.ContainerId=Containers.Id
 where
 Containers.Category = @ContainerCategory and
 Containers.Status = 3 and -- always 3 (not parametrized)
 Items.PartId = @PartId
 and Items.CustomerId = @CustomerId
 and Items.StartDate <= 'some date' and Items.EndDate >= 'some date' and
 Items.MinQuantity <= 10 and (Items.MaxQuantity is null or Items.MaxQuantity >= 10)
 '
 END
 ELSE IF (@CustomerCategory IS NOT NULL)
 BEGIN
 -- ...Combo 2 mutually exclusive case
 END
 ELSE
 BEGIN -- @CustomerArea should not be null here
 -- ...Combo 3 mutually exclusive case
 END
 END
 ELSE -- @PartCategory should not be null here
 BEGIN
 IF (@CustomerId IS NOT NULL)
 BEGIN
 -- Combo 4: PartCategory & CustomerId Filtering
 SET @DynamicSQLStatement =
 N'
 select Items.ColumnA, Items.ColumnB
 from Items
 inner join Containers on Items.ContainerId=Containers.Id
 where
 Containers.Category = @ContainerCategory and
 Containers.Status = 3 and -- always 3 (not parametrized)
 Items.PartCategory = @PartCategory
 and Items.CustomerId = @CustomerId
 and Items.StartDate <= 'some date' and Items.EndDate >= 'some date' and
 Items.MinQuantity <= 10 and (Items.MaxQuantity is null or Items.MaxQuantity >= 10)
 '
 END
 ELSE IF (@CustomerCategory IS NOT NULL)
 BEGIN
 -- ...Combo 5 mutually exclusive case
 END
 ELSE
 BEGIN -- @CustomerArea should not be null here
 -- ...Combo 6 mutually exclusive case
 END
 END
 -- Execute the Dynamic SQL statement
 EXEC sp_executesql
 -- Statement
 @DynamicSQLStatement,
 -- Argument signature 
 N'
 @ContainerCategory NVARCHAR(5),
 @PartId UNIQUEIDENTIFIER, 
 @PartCategory NVARCHAR(5), 
 @CustomerId UNIQUEIDENTIFIER,
 @CustomerCategory NVARCHAR(10),
 @CustomerArea NVARCHAR(10),
 @StartDate DATE,
 @EndDate DATE
 ',
 -- Parameters
 @ContainerCategory,
 @PartId, 
 @PartCategory, 
 @CustomerId,
 @CustomerCategory,
 @CustomerArea,
 @StartDate,
 @EndDate
END

Branching can get verbose quickly, as seen above with the code of only two of the six cases actually filled in. So I recommend still putting each code branch in its own stored procedure, and then calling that stored procedure via the Dynamic SQL instead. This improves readability and maintainability.

Static filters can be efficiently predicated on via a Filtered Index or Indexed View by reducing the search space and persisting only a subset of the data that is actually needed to be queried. Since Containers.Status = 3 is not parameterized and a constant value being filtered on, we can easily define our index to be filtered on this value like so:

CREATE NONCLUSTERED INDEX IX_Containers_Filtered_StatusEquals3 ON Containers (Id, Category) WHERE Status = 3

In all honesty though, because of how small your Containers table is (both height and width-wise), you'll probably not see much gain implementing this kind of index here in this specific case. But it's a great tool to be aware of in general, especially for larger tables whose majority of records would be filtered out by the filter expression in the index.

Proper indexing in general can go a long way though. With the re-written query above that breaks it up into mutually exclusive parts or even for the branched version, the following couple of indexes on the predicates of each iteration may prove to be of additional help:

CREATE NONCLUSTERED INDEX IX_Items_Combo1 ON Items (PartId, CustomerId, StartDate, EndDate, MinQuantity, MaxQuantity);
CREATE NONCLUSTERED INDEX IX_Items_Combo2 ON Items (PartId, CustomerCategory, StartDate, EndDate, MinQuantity, MaxQuantity);
CREATE NONCLUSTERED INDEX IX_Items_Combo3 ON Items (PartId, CustomerArea, StartDate, EndDate, MinQuantity, MaxQuantity);
CREATE NONCLUSTERED INDEX IX_Items_Combo4 ON Items (PartCategory, CustomerId, StartDate, EndDate, MinQuantity, MaxQuantity);
CREATE NONCLUSTERED INDEX IX_Items_Combo5 ON Items (PartCategory, CustomerCategory, StartDate, EndDate, MinQuantity, MaxQuantity);
CREATE NONCLUSTERED INDEX IX_Items_Combo6 ON Items (PartCategory, CustomerArea, StartDate, EndDate, MinQuantity, MaxQuantity);

Indexing can be a little tricky because there is write overhead against your base tables whenever data is inserted, updated, or deleted, for each index against those tables. And the width (amount of columns) indexed increases that overhead. I typically aim for a 5x5 guideline, which is try to roughly not add more than 5 indexes per table, and try to not add more than 5 columns per index. But these are very loose guidelines, and exceeding them is quite ok. Your mileage will really just vary depending on the busyness of your system, especially in regards to writes vs reads of the data. So it really just comes down to a little bit of trial and error with a lot of careful testing.

That being said, while the above caters a specific index to each branch of the above query, we can potentially still cater to all of the branches with half the amount of indexes, by removing one column from each of them (the PartId or PartCategory) and they'll still potentially be applicable to your query. E.g. we can try implementing only 3 indexes like so:

CREATE NONCLUSTERED INDEX IX_Items_Combo1And4 ON Items (CustomerId, StartDate, EndDate, MinQuantity, MaxQuantity);
CREATE NONCLUSTERED INDEX IX_Items_Combo2And5 ON Items (CustomerCategory, StartDate, EndDate, MinQuantity, MaxQuantity);
CREATE NONCLUSTERED INDEX IX_Items_Combo3And6 ON Items (CustomerArea, StartDate, EndDate, MinQuantity, MaxQuantity);

Since all of these fields are part of the predicates for each of the queries these indexes are written for, they are still applicable. This cuts our write overhead in half while still potentially improving the read performance of the above query. But you'll need to test and review the execution plan to ensure that these indexes are actually used in a sargable manner. It's possible, depending on your data and its statistics, that one set of indexes might not actually get used by the query optimizer, even if those indexes are well defined for the fields being predicated on. You may find you actually do need to use the aforementioned six indexes instead, for example.

Taking indexing one step further, you can use the INCLUDE clause to include the columns you're projecting, so that when the index is used, an additional key lookup operation is not needed to find those projected fields in your table. Instead, those projected columns will already be persisted with the index that was used to serve your predicate.

An example of what an index looks like with those projected columns included would be:

CREATE NONCLUSTERED INDEX IX_Items_Combo1And4 ON Items (CustomerId, StartDate, EndDate, MinQuantity, MaxQuantity) INCLUDE (ColumnA, ColumnB);

So this also gives you additional index tuning options to consider and test with.

Normalizing your tables may force you to write better queries from the get-go. This is an in-depth topic I'm not going to cover, especially because I'm not an expert on it. But intuitively, it's a little odd sounding to have an Items table that is defined by a PartId OR a PartCategory. This table design caused you to introduce a composite OR in your original query. Instead, if you re-architected your table to be a Parts table and PartCategories table, and refactored out the common fields to their own table potentially, then you would find you would've naturally written separate queries for the different combinations of cases, already implicitly branching the code. Just some food for thought...
"even if I use things like a stored procedure to loop on inputs and execute my query 1000 times" - Yea, nah, don't do that please. SQL code operates most efficiently when operating on the data with set logic not iterative logic. So a stored procedure is a good tool, but don't immediately go reaching for a loop. Instead, think of a way to process all 1,000 instances of the input in a set-based manner. For example, if the 1,000 instances of input are different combinations of your parameters for the above query, you can input them into a temp table and join your temp table to each branch of the above query on those parameter-based columns. This will allow the query optimizer to plan an efficient query plan based on the statistics of the data that's going to be processed based on all of those parameter values, and ensures the query is only executed once as opposed to 1,000 times.
"I fear that the gain will also be small as the database will still have to seek on large indexes" - Maybe this is just a verbiage confusion thing, but with indexes there are two types of operations: Index Seeks and Index Scans. Index Seeks are extremely fast, regardless of how big the index is, when your query only needs a smaller subset of the entire table data. So no need to worry about the amount of data in your indexes, in regards to Index Seeks, when you're only querying a smaller subset of the data at a time.

The reason they're so efficient is because they are backed by a B-Tree data structure which has a search time complexity of O(log2(n)). This means if your index had 1 billion rows, in the worst case, an index seek against it would only need to search ~30 rows at most. log2(1 billion) = ~30. A graphing calculator could seek on such small amount of data in milliseconds, let alone a whole database server.

There's probably a lot more that can be addressed, and as initially stated, having the execution plan would allow us to provide more targeted advice to your root problem. In any case, best of luck!

Thanks for this detailed answer. I'll be sure to read it in detail (and digest it) and provide some more info soon. One thing I'll say for now is about the exclusive values: they are exclusive in the data (you can only have one of CustomerId, CustomerCategory and CustomerAreaper row), but all three will always be filled in the query parameters. Because of this I think your second point (branching depending on the parameters values) won't apply. But I'll check out the rest of your suggestions and see how it goes!
@Shtong No problem! Ah seems like you really mean to say those column combinations are unique. Branching can still be used if you can reasonably hard-code those values in each branch of the query or a combination of branching and the UNION strategy perhaps. But would be easier to say if I was in front of your data. Best of luck!

Stack Exchange Network

Creating efficient indexes on SQL Server with a big WHERE clause

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Creating efficient indexes on SQL Server with a big WHERE clause

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions