SQL query to return a set of calculated values per site from multiple tables

Question 1

I have written the below and am seeking any way that I could improve its performance.

IF OBJECT_ID('tempdb..#PLUTotals') IS NOT NULL
 DROP TABLE [dbo].[#PLUTotals];
IF OBJECT_ID('tempdb..#Level4Totals') IS NOT NULL
 DROP TABLE [dbo].[#Level4Totals];
IF OBJECT_ID('tempdb..#WasteTotals') IS NOT NULL
 DROP TABLE [dbo].[#WasteTotals];
IF OBJECT_ID('tempdb..#GPTotal') IS NOT NULL
 DROP TABLE [dbo].[#GPTotal];
SELECT SiteNo
 ,CASE
 WHEN Media = 1003
 THEN isNull(Value,0)
 ELSE 0
 END AS GrossSalesValue
 ,CASE
 WHEN Media = 1001
 THEN isNull(Qty,0)
 ELSE 0
 END AS CustomerCount
 ,CASE
 WHEN Media = 5001
 AND MediaIndex IN(1,8)
 THEN isNull(Value,0)
 ELSE 0
 END AS EodDiscount
 ,CASE
 WHEN Media = 5006
 THEN isNull(Value,0)
 ELSE 0
 END AS TotalDiscount
 ,CASE
 WHEN Media = 5001
 AND MediaIndex IN(26,168,35,38,72,193,119,208,211,212,213,47)
 THEN isNull(Qty,0)
 ELSE 0
 END AS MealDeal
INTO #PLUTotals
FROM TransactionDetail
WHERE EXISTS
(
 SELECT Descendant
 FROM DescendantSites
 WHERE Parent IN(@SiteNo)
 AND Descendant = TransactionDetail.SiteNo
)
AND Date BETWEEN @SessionDateFrom AND @SessionDateTo
AND TermNo &gt; 0
AND EXISTS
(
 SELECT MediaNo
 FROM Media
 WHERE MediaNo IN(1001,1003,5001,5006)
 AND MediaNo = TransactionDetail.Media
);
SELECT SiteNo
 ,isNull(SUM(Value),0) AS Level4Sales
INTO #Level4Totals
FROM PLUSalesExtended
WHERE [Level] + 1 = 4
 AND SessionDate BETWEEN @SessionDateFrom AND @SessionDateTo
 AND TermNo = 0
 AND PluEntryType &lt;&gt; 4
 AND EXISTS
(
 SELECT Descendant
 FROM DescendantSites
 WHERE Parent IN(@SiteNo)
 AND Descendant = PluSalesExtended.SiteNo
)
GROUP BY SiteNo;
SELECT CustomerCode AS SiteNo
 ,ABS(SUM(Waste_StandardCost)) AS WasteTotals
INTO #WasteTotals
FROM [CybakeRS].[dbo].utf_Waste(@SessionDateFrom,@SessionDateTo,1)
WHERE EXISTS
(
 SELECT Descendant
 FROM DescendantSites
 WHERE Parent IN(@SiteNo)
 AND Descendant = CybakeRS.dbo.utf_Waste.CustomerCode
)
AND Waste_Qty &lt;&gt; 0
GROUP BY CustomerCode;
SELECT PluSales.SiteNo
 ,SUM(PluSales.Value) AS SalesValue
 ,SUM(PluSales.VatValue) AS VatAmount
 ,SUM(PluSales.Cost) AS CostOfSales
INTO #GPTotal
FROM PluSales
 INNER JOIN Sessions ON sessions.siteno = plusales.siteno
 AND sessions.sessionno = plusales.sessionno
 AND sessions.termno = plusales.termno
 AND sessions.type = 7
 AND sessions.termno = 0
WHERE sessions.sessiondate BETWEEN @SessionDateFrom AND @SessionDateTo
 AND EXISTS
(
 SELECT Descendant
 FROM DescendantSites
 WHERE Parent IN(@SiteNo)
 AND Descendant = PluSales.SiteNo
)
GROUP BY PluSales.SiteNo;
SELECT CASE
 WHEN PT.SiteNo IN(70,71)
 THEN 69
 WHEN PT.SiteNo IN(40,41)
 THEN 45
 ELSE PT.SiteNo
 END AS SiteNo
 ,SUM(PT.GrossSalesValue) AS GrossSalesValue
 ,SUM(PT.CustomerCount) AS CustomerCount
 ,SUM(PT.EodDiscount) AS EodDiscount
 ,SUM(PT.TotalDiscount) AS TotalDiscount
 ,SUM(PT.MealDeal) AS MealDeal
 ,L4T.Level4Sales AS Level4Sales
 ,WT.WasteTotals AS WasteTotals
 ,SUM(PT.MealDeal) / SUM(PT.CustomerCount) AS MealDealPercent
 ,L4T.Level4Sales / (L4T.Level4Sales + SUM(PT.EodDiscount)) AS Conversion
 ,SUM(PT.GrossSalesValue) / SUM(PT.CustomerCount) AS AveSpend
 ,SUM(GPT.SalesValue) AS GPTSales
 ,SUM(GPT.CostOfSales) AS GPTCosts
 ,SUM(GPT.VatAmount) AS GPTVatAmount
FROM #PLUTotals AS PT
 LEFT JOIN #Level4Totals AS L4T ON PT.SiteNo = L4T.SiteNo
 LEFT JOIN #WasteTotals AS WT ON PT.SiteNo = WT.SiteNo
 LEFT JOIN #GPTotal AS GPT ON PT.SiteNo = GPT.SiteNo
GROUP BY CASE
 WHEN PT.SiteNo IN(70,71)
 THEN 69
 WHEN PT.SiteNo IN(40,41)
 THEN 45
 ELSE PT.SiteNo
 END
 ,L4T.Level4Sales
 ,WT.WasteTotals;
DROP TABLE [dbo].[#PLUTotals];
DROP TABLE [dbo].[#Level4Totals];
DROP TABLE [dbo].[#WasteTotals];
DROP TABLE [dbo].[#GPTotal];

The code runs 5 queries. Query 1-4 prefetchs all the required data and stores the results into their own temp table. Query 5 then joins all the temp tables to give me the required result set. We have evidence that an index can improve the performance drastically but this cant easily be implemented in the clients solution at the moment. Any advice on how to improve this would be greatly appreciated. Are there any obvious pit falls that I am falling into?

Question 2

There is one DROP at the top without an IF, a copy/paste mistake?

Question 3

Aye, copy paste. Will update.

Question 4

@BCdotWEB I have tried to make the title more relevant. Is this acceptable or do I need to refine it more?

Question 5

The problem with optimizing the queries is that you are severely hampered by the structure of the database. Because we have assigned several meanings to the column Qty and Value which requires CASE statement to parse out to get meaningful aggregation, this becomes problematic for performance.

The answer below is on the assumption that you cannot change the structure of the database but can create indices as needed. The answer furthermore presumes that the query is mission-critical so we are willing to do what we need to do make to run faster as possible.

1) You already have an EXISTS query to pre-filter the TransactionDetails table to only MediaID you care about and that's good. However, I would not be surprised if the engine end up doing a table scan anyway because there are too many logical filters. One possible choice is to do an individual subquery, so that each aggregation stand on their own, and allow the engine to build an execution plan that targets the specific MediaId and other specific criteria.

2) Building on #1, I would investigate whether it will be further improved by using filtered indices. Using the first CASE as an example, we could create a new index like so....

CREATE INDEX IX_TransactionDetail_Filtered1
ON dbo.TransactionDetail(SiteID, Date, Value)
WHERE TermNo > 0
 AND Media = 1003;

Observe that the hard-coded values such as TermNo and Media are filtered, so you can use the index more effectively with the standalone query (which can be used as a subquery):

SELECT 
 SiteNo,
 isNull(Value,0) AS GrossSalesValue
FROM dbo.TransactionDetail
WHERE TermID > 0
 AND Media = 1003
 AND Date BETWEEN @SessionDateFrom AND @SessionDateTo 
 AND EXISTS ( ... sites subquery ...)

Repeating the above for 3 other subqueries, with their own filtered index may or may not help, depends on the shape/volume of the data.

This will not help if they are of low selectivity. In which case, simply scanning the table is likely to be the fastest option because that way, you visit all rows only once as opposed to potentially revisiting when using multiple subqueries.

3) Create an indexed view

If you do not want to create 4 or more filtered indices (and you probably do not want that many anyway), then the alternative is to see if you can write an indexed view instead. In your initial query, you read only the TransactionDetail with some criteria. The one that are dynamic are the SiteNo and Date, so the view would need to look something similar to this....

CREATE VIEW dbo.vwTransactionDetail_Indexed 
WITH SCHEMABINDING AS
SELECT SiteNo
 ,Date
 ,CASE
 WHEN Media = 1003
 THEN isNull(Value,0)
 ELSE 0
 END AS GrossSalesValue
 ,CASE
 WHEN Media = 1001
 THEN isNull(Qty,0)
 ELSE 0
 END AS CustomerCount
 ,CASE
 WHEN Media = 5001
 AND MediaIndex IN(1,8)
 THEN isNull(Value,0)
 ELSE 0
 END AS EodDiscount
 ,CASE
 WHEN Media = 5006
 THEN isNull(Value,0)
 ELSE 0
 END AS TotalDiscount
 ,CASE
 WHEN Media = 5001
 AND MediaIndex IN(26,168,35,38,72,193,119,208,211,212,213,47)
 THEN isNull(Qty,0)
 ELSE 0
 END AS MealDeal
FROM TransactionDetail
WHERE EXISTS
(
 SELECT Descendant
 FROM DescendantSites
 WHERE Descendant = TransactionDetail.SiteNo
)
AND TermNo > 0
AND EXISTS
(
 SELECT MediaNo
 FROM Media
 WHERE MediaNo IN(1001,1003,5001,5006)
 AND MediaNo = TransactionDetail.Media
);

That way, you can just filter the indexed view for the SiteNo and Date and use the result to insert into #PLUTotals. Note that I'm not 100% sure if the EXISTS is needed for the DescendantSites -- if it's a subset of all possible sites, then we'll keep that. But if it's all possible sites, then it buys us nothing so it should go, too.

4) Maintain a summary table instead

The other option which requires you to create a new table and then attach a trigger to the source tables would be to use a trigger to update the summary table and have your queries read off the summary table instead. This makes it easy to set up without adding more indices. This means instead of taking performance hit at read time, you take a bit less of a hit at write time.

Note that attaching a trigger to table is a big maintenance problem, however. I'd much prefer that it be attached to a view instead, making it opt-in, so that developers can use it when they actually need it, and skip it without having to disable it (which is a schema modification) when they do a bulk import of clean data (as an example).

Conclusion

The similar principles can be applied to the other queries. Put together, it should make the final query much quicker as I believe the majority of performance is tied up in the setup. You already mentioned you can't create an index at client's site. In that case, you're pretty much out of luck because as I said, database performance is strongly tied to the design of the database and frankly, it's awful that you have to do CASE WHEN with several criteria to get one answer. I realize that changing the database design is much less trivial to do, which is why I talk about indices since those don't change the physical structure of the database. I'd be talking to the client to find a way to make that happen at least.

Question 6

Thanks for the feedback. There are some techniques I hadn't considered raised which is good food for thought.

this this 2,03913 silver badges24 bronze badges · Accepted Answer · 2018-03-29 12:55:07Z

The problem with optimizing the queries is that you are severely hampered by the structure of the database. Because we have assigned several meanings to the column Qty and Value which requires CASE statement to parse out to get meaningful aggregation, this becomes problematic for performance.

The answer below is on the assumption that you cannot change the structure of the database but can create indices as needed. The answer furthermore presumes that the query is mission-critical so we are willing to do what we need to do make to run faster as possible.

1) You already have an EXISTS query to pre-filter the TransactionDetails table to only MediaID you care about and that's good. However, I would not be surprised if the engine end up doing a table scan anyway because there are too many logical filters. One possible choice is to do an individual subquery, so that each aggregation stand on their own, and allow the engine to build an execution plan that targets the specific MediaId and other specific criteria.

2) Building on #1, I would investigate whether it will be further improved by using filtered indices. Using the first CASE as an example, we could create a new index like so....

CREATE INDEX IX_TransactionDetail_Filtered1
ON dbo.TransactionDetail(SiteID, Date, Value)
WHERE TermNo > 0
 AND Media = 1003;

Observe that the hard-coded values such as TermNo and Media are filtered, so you can use the index more effectively with the standalone query (which can be used as a subquery):

SELECT 
 SiteNo,
 isNull(Value,0) AS GrossSalesValue
FROM dbo.TransactionDetail
WHERE TermID > 0
 AND Media = 1003
 AND Date BETWEEN @SessionDateFrom AND @SessionDateTo 
 AND EXISTS ( ... sites subquery ...)

Repeating the above for 3 other subqueries, with their own filtered index may or may not help, depends on the shape/volume of the data.

This will not help if they are of low selectivity. In which case, simply scanning the table is likely to be the fastest option because that way, you visit all rows only once as opposed to potentially revisiting when using multiple subqueries.

3) Create an indexed view

If you do not want to create 4 or more filtered indices (and you probably do not want that many anyway), then the alternative is to see if you can write an indexed view instead. In your initial query, you read only the TransactionDetail with some criteria. The one that are dynamic are the SiteNo and Date, so the view would need to look something similar to this....

CREATE VIEW dbo.vwTransactionDetail_Indexed 
WITH SCHEMABINDING AS
SELECT SiteNo
 ,Date
 ,CASE
 WHEN Media = 1003
 THEN isNull(Value,0)
 ELSE 0
 END AS GrossSalesValue
 ,CASE
 WHEN Media = 1001
 THEN isNull(Qty,0)
 ELSE 0
 END AS CustomerCount
 ,CASE
 WHEN Media = 5001
 AND MediaIndex IN(1,8)
 THEN isNull(Value,0)
 ELSE 0
 END AS EodDiscount
 ,CASE
 WHEN Media = 5006
 THEN isNull(Value,0)
 ELSE 0
 END AS TotalDiscount
 ,CASE
 WHEN Media = 5001
 AND MediaIndex IN(26,168,35,38,72,193,119,208,211,212,213,47)
 THEN isNull(Qty,0)
 ELSE 0
 END AS MealDeal
FROM TransactionDetail
WHERE EXISTS
(
 SELECT Descendant
 FROM DescendantSites
 WHERE Descendant = TransactionDetail.SiteNo
)
AND TermNo > 0
AND EXISTS
(
 SELECT MediaNo
 FROM Media
 WHERE MediaNo IN(1001,1003,5001,5006)
 AND MediaNo = TransactionDetail.Media
);

That way, you can just filter the indexed view for the SiteNo and Date and use the result to insert into #PLUTotals. Note that I'm not 100% sure if the EXISTS is needed for the DescendantSites -- if it's a subset of all possible sites, then we'll keep that. But if it's all possible sites, then it buys us nothing so it should go, too.

4) Maintain a summary table instead

The other option which requires you to create a new table and then attach a trigger to the source tables would be to use a trigger to update the summary table and have your queries read off the summary table instead. This makes it easy to set up without adding more indices. This means instead of taking performance hit at read time, you take a bit less of a hit at write time.

Note that attaching a trigger to table is a big maintenance problem, however. I'd much prefer that it be attached to a view instead, making it opt-in, so that developers can use it when they actually need it, and skip it without having to disable it (which is a schema modification) when they do a bulk import of clean data (as an example).

Conclusion

The similar principles can be applied to the other queries. Put together, it should make the final query much quicker as I believe the majority of performance is tied up in the setup. You already mentioned you can't create an index at client's site. In that case, you're pretty much out of luck because as I said, database performance is strongly tied to the design of the database and frankly, it's awful that you have to do CASE WHEN with several criteria to get one answer. I realize that changing the database design is much less trivial to do, which is why I talk about indices since those don't change the physical structure of the database. I'd be talking to the client to find a way to make that happen at least.

Thanks for the feedback. There are some techniques I hadn't considered raised which is good food for thought.

Stack Exchange Network

SQL query to return a set of calculated values per site from multiple tables

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

SQL query to return a set of calculated values per site from multiple tables

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions