I have written the below and am seeking any way that I could improve its performance.
IF OBJECT_ID('tempdb..#PLUTotals') IS NOT NULL
DROP TABLE [dbo].[#PLUTotals];
IF OBJECT_ID('tempdb..#Level4Totals') IS NOT NULL
DROP TABLE [dbo].[#Level4Totals];
IF OBJECT_ID('tempdb..#WasteTotals') IS NOT NULL
DROP TABLE [dbo].[#WasteTotals];
IF OBJECT_ID('tempdb..#GPTotal') IS NOT NULL
DROP TABLE [dbo].[#GPTotal];
SELECT SiteNo
,CASE
WHEN Media = 1003
THEN isNull(Value,0)
ELSE 0
END AS GrossSalesValue
,CASE
WHEN Media = 1001
THEN isNull(Qty,0)
ELSE 0
END AS CustomerCount
,CASE
WHEN Media = 5001
AND MediaIndex IN(1,8)
THEN isNull(Value,0)
ELSE 0
END AS EodDiscount
,CASE
WHEN Media = 5006
THEN isNull(Value,0)
ELSE 0
END AS TotalDiscount
,CASE
WHEN Media = 5001
AND MediaIndex IN(26,168,35,38,72,193,119,208,211,212,213,47)
THEN isNull(Qty,0)
ELSE 0
END AS MealDeal
INTO #PLUTotals
FROM TransactionDetail
WHERE EXISTS
(
SELECT Descendant
FROM DescendantSites
WHERE Parent IN(@SiteNo)
AND Descendant = TransactionDetail.SiteNo
)
AND Date BETWEEN @SessionDateFrom AND @SessionDateTo
AND TermNo > 0
AND EXISTS
(
SELECT MediaNo
FROM Media
WHERE MediaNo IN(1001,1003,5001,5006)
AND MediaNo = TransactionDetail.Media
);
SELECT SiteNo
,isNull(SUM(Value),0) AS Level4Sales
INTO #Level4Totals
FROM PLUSalesExtended
WHERE [Level] + 1 = 4
AND SessionDate BETWEEN @SessionDateFrom AND @SessionDateTo
AND TermNo = 0
AND PluEntryType <> 4
AND EXISTS
(
SELECT Descendant
FROM DescendantSites
WHERE Parent IN(@SiteNo)
AND Descendant = PluSalesExtended.SiteNo
)
GROUP BY SiteNo;
SELECT CustomerCode AS SiteNo
,ABS(SUM(Waste_StandardCost)) AS WasteTotals
INTO #WasteTotals
FROM [CybakeRS].[dbo].utf_Waste(@SessionDateFrom,@SessionDateTo,1)
WHERE EXISTS
(
SELECT Descendant
FROM DescendantSites
WHERE Parent IN(@SiteNo)
AND Descendant = CybakeRS.dbo.utf_Waste.CustomerCode
)
AND Waste_Qty <> 0
GROUP BY CustomerCode;
SELECT PluSales.SiteNo
,SUM(PluSales.Value) AS SalesValue
,SUM(PluSales.VatValue) AS VatAmount
,SUM(PluSales.Cost) AS CostOfSales
INTO #GPTotal
FROM PluSales
INNER JOIN Sessions ON sessions.siteno = plusales.siteno
AND sessions.sessionno = plusales.sessionno
AND sessions.termno = plusales.termno
AND sessions.type = 7
AND sessions.termno = 0
WHERE sessions.sessiondate BETWEEN @SessionDateFrom AND @SessionDateTo
AND EXISTS
(
SELECT Descendant
FROM DescendantSites
WHERE Parent IN(@SiteNo)
AND Descendant = PluSales.SiteNo
)
GROUP BY PluSales.SiteNo;
SELECT CASE
WHEN PT.SiteNo IN(70,71)
THEN 69
WHEN PT.SiteNo IN(40,41)
THEN 45
ELSE PT.SiteNo
END AS SiteNo
,SUM(PT.GrossSalesValue) AS GrossSalesValue
,SUM(PT.CustomerCount) AS CustomerCount
,SUM(PT.EodDiscount) AS EodDiscount
,SUM(PT.TotalDiscount) AS TotalDiscount
,SUM(PT.MealDeal) AS MealDeal
,L4T.Level4Sales AS Level4Sales
,WT.WasteTotals AS WasteTotals
,SUM(PT.MealDeal) / SUM(PT.CustomerCount) AS MealDealPercent
,L4T.Level4Sales / (L4T.Level4Sales + SUM(PT.EodDiscount)) AS Conversion
,SUM(PT.GrossSalesValue) / SUM(PT.CustomerCount) AS AveSpend
,SUM(GPT.SalesValue) AS GPTSales
,SUM(GPT.CostOfSales) AS GPTCosts
,SUM(GPT.VatAmount) AS GPTVatAmount
FROM #PLUTotals AS PT
LEFT JOIN #Level4Totals AS L4T ON PT.SiteNo = L4T.SiteNo
LEFT JOIN #WasteTotals AS WT ON PT.SiteNo = WT.SiteNo
LEFT JOIN #GPTotal AS GPT ON PT.SiteNo = GPT.SiteNo
GROUP BY CASE
WHEN PT.SiteNo IN(70,71)
THEN 69
WHEN PT.SiteNo IN(40,41)
THEN 45
ELSE PT.SiteNo
END
,L4T.Level4Sales
,WT.WasteTotals;
DROP TABLE [dbo].[#PLUTotals];
DROP TABLE [dbo].[#Level4Totals];
DROP TABLE [dbo].[#WasteTotals];
DROP TABLE [dbo].[#GPTotal];
The code runs 5 queries. Query 1-4 prefetchs all the required data and stores the results into their own temp table. Query 5 then joins all the temp tables to give me the required result set. We have evidence that an index can improve the performance drastically but this cant easily be implemented in the clients solution at the moment. Any advice on how to improve this would be greatly appreciated. Are there any obvious pit falls that I am falling into?
1 Answer 1
The problem with optimizing the queries is that you are severely hampered by the structure of the database. Because we have assigned several meanings to the column Qty
and Value
which requires CASE
statement to parse out to get meaningful aggregation, this becomes problematic for performance.
The answer below is on the assumption that you cannot change the structure of the database but can create indices as needed. The answer furthermore presumes that the query is mission-critical so we are willing to do what we need to do make to run faster as possible.
1) You already have an EXISTS
query to pre-filter the TransactionDetails
table to only MediaID
you care about and that's good. However, I would not be surprised if the engine end up doing a table scan anyway because there are too many logical filters. One possible choice is to do an individual subquery, so that each aggregation stand on their own, and allow the engine to build an execution plan that targets the specific MediaId
and other specific criteria.
2) Building on #1, I would investigate whether it will be further improved by using filtered indices. Using the first CASE
as an example, we could create a new index like so....
CREATE INDEX IX_TransactionDetail_Filtered1
ON dbo.TransactionDetail(SiteID, Date, Value)
WHERE TermNo > 0
AND Media = 1003;
Observe that the hard-coded values such as TermNo
and Media
are filtered, so you can use the index more effectively with the standalone query (which can be used as a subquery):
SELECT
SiteNo,
isNull(Value,0) AS GrossSalesValue
FROM dbo.TransactionDetail
WHERE TermID > 0
AND Media = 1003
AND Date BETWEEN @SessionDateFrom AND @SessionDateTo
AND EXISTS ( ... sites subquery ...)
Repeating the above for 3 other subqueries, with their own filtered index may or may not help, depends on the shape/volume of the data.
This will not help if they are of low selectivity. In which case, simply scanning the table is likely to be the fastest option because that way, you visit all rows only once as opposed to potentially revisiting when using multiple subqueries.
3) Create an indexed view
If you do not want to create 4 or more filtered indices (and you probably do not want that many anyway), then the alternative is to see if you can write an indexed view instead. In your initial query, you read only the TransactionDetail with some criteria. The one that are dynamic are the SiteNo
and Date
, so the view would need to look something similar to this....
CREATE VIEW dbo.vwTransactionDetail_Indexed
WITH SCHEMABINDING AS
SELECT SiteNo
,Date
,CASE
WHEN Media = 1003
THEN isNull(Value,0)
ELSE 0
END AS GrossSalesValue
,CASE
WHEN Media = 1001
THEN isNull(Qty,0)
ELSE 0
END AS CustomerCount
,CASE
WHEN Media = 5001
AND MediaIndex IN(1,8)
THEN isNull(Value,0)
ELSE 0
END AS EodDiscount
,CASE
WHEN Media = 5006
THEN isNull(Value,0)
ELSE 0
END AS TotalDiscount
,CASE
WHEN Media = 5001
AND MediaIndex IN(26,168,35,38,72,193,119,208,211,212,213,47)
THEN isNull(Qty,0)
ELSE 0
END AS MealDeal
FROM TransactionDetail
WHERE EXISTS
(
SELECT Descendant
FROM DescendantSites
WHERE Descendant = TransactionDetail.SiteNo
)
AND TermNo > 0
AND EXISTS
(
SELECT MediaNo
FROM Media
WHERE MediaNo IN(1001,1003,5001,5006)
AND MediaNo = TransactionDetail.Media
);
That way, you can just filter the indexed view for the SiteNo
and Date
and use the result to insert into #PLUTotals
. Note that I'm not 100% sure if the EXISTS
is needed for the DescendantSites
-- if it's a subset of all possible sites, then we'll keep that. But if it's all possible sites, then it buys us nothing so it should go, too.
4) Maintain a summary table instead
The other option which requires you to create a new table and then attach a trigger to the source tables would be to use a trigger to update the summary table and have your queries read off the summary table instead. This makes it easy to set up without adding more indices. This means instead of taking performance hit at read time, you take a bit less of a hit at write time.
Note that attaching a trigger to table is a big maintenance problem, however. I'd much prefer that it be attached to a view instead, making it opt-in, so that developers can use it when they actually need it, and skip it without having to disable it (which is a schema modification) when they do a bulk import of clean data (as an example).
Conclusion
The similar principles can be applied to the other queries. Put together, it should make the final query much quicker as I believe the majority of performance is tied up in the setup. You already mentioned you can't create an index at client's site. In that case, you're pretty much out of luck because as I said, database performance is strongly tied to the design of the database and frankly, it's awful that you have to do CASE WHEN with several criteria to get one answer. I realize that changing the database design is much less trivial to do, which is why I talk about indices since those don't change the physical structure of the database. I'd be talking to the client to find a way to make that happen at least.
-
\$\begingroup\$ Thanks for the feedback. There are some techniques I hadn't considered raised which is good food for thought. \$\endgroup\$NinjaArekku– NinjaArekku2018年04月11日 08:47:00 +00:00Commented Apr 11, 2018 at 8:47
DROP
at the top without anIF
, a copy/paste mistake? \$\endgroup\$