Poor performance on inner join using dates and between

Question 1

I am having trouble optimizing a query that does an inner join using a date range. The purposes of the query is take daily data and summarize by week.

Select pcw.EndDate WeekEndDate, h.Store, SUM(h.DeliveryChargesTotal) DeliveryChargesTotal 
from Daily_GC_Headers h
inner join PeriodCalendar_Weeks pcw
on h.SalesDate between pcw.StartDate and pcw.EndDate
where SalesDate between @StartDate and @EndDate and isCanceled = 0 
group by pcw.EndDate, h.Store

Simplified schema of Daily_GC_Headers table (13.8 million rows; about 5.4 million match criteria in WHERE clause):

Store - Varchar(10) (PK)
SalesDate - Date (PK)
TicketNumber - SmallInt (PK; starts over 1 each day at each store.)
IsCanceled - Bit
DeliveryChargesTotal - Decimal(9,2)

Simplified schema of PeriodCalendar_Weeks Table (570 rows; 53 match the criteria):

Year - smallint (PK)
Period - tinyint (PK)
Week - tinyint (PK)
StartDate - Date 
EndDate - Date

This query takes about 15 seconds in SSMS. Querying Daily_GC_Headers by itself (and just grouping by Store) takes 2 seconds. A query against PeriodCalendar_Weeks is "instant".

DBCC SHOW_STATISTICS indicates that the stats are both tables are current (we run a weekly job to update them). I've tried clearing the plan caches.

The execution plan is strange. For example, it is doing an Eager Spool on PeriodCalendar_Weeks. The estimated rows is 156.6 but the actual rows is 153,971. It then filters the results of that first spool and does a Lazy Spool. The estimated/actual rows of that 2nd spool is 5.4 million, even though the underlying table has less than 600 rows in it.

What should I be looking for or doing to optimize this?

Additional Information

For sake of clarity, I initially described an oversimplified PK on the Weeks table. I have update the schema above to show the full key. The PK described for the Headers is (and was) the full key.

Screen shot of some rows from the Weeks table: enter image description here

Stats from the Weeks table: enter image description here

Some stats from Headers table. There seems to be an histogram record for about every 5-10 days for the entire history in the table (3 years). enter image description here

Question 2

Can you attach the plan? Also, have you run the queries with set statistics io and compared logical reads?

Question 3

@Mike Fal: The XML of the query plan was too big to post here.

Question 4

I guess the isCanceled column is in the Daily_GC_Headers table, right? Add also in the question the datatypes of the columns and the indexes you have.

Question 5

You can post the plan to answers.sqlperformance.com using SQL Sentry Plan Explorer. Best option is a non-anonymized actual plan. Disclaimer: I work for SQL Sentry.

Question 6

@MikeFal: Posted to answers.sqlperformance.com/questions/1568/…

Question 7

Much more efficient to do this without having to go back and join to the periods table.

DECLARE @StartDate DATE, @EndDate DATE;
Select @StartDate = Min(StartDate), @EndDate = MAX(EndDate) 
from dbo.PeriodCalendar_Weeks pcw
where (pcw.Year = @Year and pcw.Period < @Period) 
 or (pcw.Year = @Year and pcw.Period = @Period and pcw.Week <= @Week) 
 or (pcw.Year = @Year -1 and pcw.Period >= @Period);
SELECT 
 WeekEndDate = DATEADD(DAY, 6, DATEADD(WEEK, SalesWeek, @StartDate)), 
 Store, 
 DeliveryChargesTotal = dct
FROM 
(
 SELECT DATEDIFF(DAY, @StartDate, SalesDate)/7, Store, SUM(DeliveryChargesTotal)
 FROM dbo.Daily_GC_Headers
 WHERE SalesDate BETWEEN @StartDate AND @EndDate AND isCanceled = 0
 GROUP BY DATEDIFF(DAY, @StartDate, SalesDate)/7, Store
) AS x (SalesWeek, Store, dct)
ORDER BY WeekEndDate, Store;

A filtered index may help, if many rows exist where isCanceled = 1 (these are just possible suggestions, depending on cardinality of Store, and may not be the most optimal):

CREATE INDEX x ON dbo.Daily_GC_Headers
 (SalesDate) INCLUDE (Store, DeliveryChargesTotal)
 WHERE isCanceled = 0;

If there are very few rows where isCanceled = 1, this may be better:

CREATE INDEX x ON dbo.Daily_GC_Headers
 (SalesDate, IsCanceled) INCLUDE (Store, DeliveryChargesTotal);

Both are worth trying on a test system, as well as moving Store into the key in either case, or moving IsCanceled to the INCLUDE list in the latter case. On my system, I found the best results with everything but the date in the INCLUDE list:

CREATE INDEX x ON dbo.Daily_GC_Headers
 (SalesDate) INCLUDE (Store, IsCanceled, DeliveryChargesTotal);

Again, you will need to test if any of these work out, or if the query above gives a different/better recommendation directly from SQL Server.

Question 8

IsCanceled=1 on at least 95% of the records.

Question 9

There are only about ~110 different values for Store so I'd say its pretty low.

Question 10

@AaronBertand: This query ran in 2 seconds, which is how fast a straight SELECT...SUM()...GROUP BY is on that table with no join. Execution plan didn't recommend any indexes, though.

Question 11

@poke so eliminating the redundant join is better, yes? Is the output different from your slower version of the query, or missing any information?

Question 12

Yes, its better. My query was taking 15 seconds. The Week End Date is actually showing the week start date, but I know how to fix that.

score 2 · Accepted Answer · 2014-03-18 19:48:09Z

Much more efficient to do this without having to go back and join to the periods table.

DECLARE @StartDate DATE, @EndDate DATE;
Select @StartDate = Min(StartDate), @EndDate = MAX(EndDate) 
from dbo.PeriodCalendar_Weeks pcw
where (pcw.Year = @Year and pcw.Period < @Period) 
 or (pcw.Year = @Year and pcw.Period = @Period and pcw.Week <= @Week) 
 or (pcw.Year = @Year -1 and pcw.Period >= @Period);
SELECT 
 WeekEndDate = DATEADD(DAY, 6, DATEADD(WEEK, SalesWeek, @StartDate)), 
 Store, 
 DeliveryChargesTotal = dct
FROM 
(
 SELECT DATEDIFF(DAY, @StartDate, SalesDate)/7, Store, SUM(DeliveryChargesTotal)
 FROM dbo.Daily_GC_Headers
 WHERE SalesDate BETWEEN @StartDate AND @EndDate AND isCanceled = 0
 GROUP BY DATEDIFF(DAY, @StartDate, SalesDate)/7, Store
) AS x (SalesWeek, Store, dct)
ORDER BY WeekEndDate, Store;

A filtered index may help, if many rows exist where isCanceled = 1 (these are just possible suggestions, depending on cardinality of Store, and may not be the most optimal):

CREATE INDEX x ON dbo.Daily_GC_Headers
 (SalesDate) INCLUDE (Store, DeliveryChargesTotal)
 WHERE isCanceled = 0;

If there are very few rows where isCanceled = 1, this may be better:

CREATE INDEX x ON dbo.Daily_GC_Headers
 (SalesDate, IsCanceled) INCLUDE (Store, DeliveryChargesTotal);

Both are worth trying on a test system, as well as moving Store into the key in either case, or moving IsCanceled to the INCLUDE list in the latter case. On my system, I found the best results with everything but the date in the INCLUDE list:

CREATE INDEX x ON dbo.Daily_GC_Headers
 (SalesDate) INCLUDE (Store, IsCanceled, DeliveryChargesTotal);

Again, you will need to test if any of these work out, or if the query above gives a different/better recommendation directly from SQL Server.

There are only about ~110 different values for Store so I'd say its pretty low.
@AaronBertand: This query ran in 2 seconds, which is how fast a straight SELECT...SUM()...GROUP BY is on that table with no join. Execution plan didn't recommend any indexes, though.
@poke so eliminating the redundant join is better, yes? Is the output different from your slower version of the query, or missing any information?
Yes, its better. My query was taking 15 seconds. The Week End Date is actually showing the week start date, but I know how to fix that.

Stack Exchange Network

Poor performance on inner join using dates and between

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Poor performance on inner join using dates and between

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions