Stored procedure for processed orders in SQL Server

Question 1

I am struggling with creating a Stored Procedure in SQL Server for getting sales statistics. I have managed to get the information I need out, but the query is a mess and I have probably messed up the table variable by using 3 different ones.

Any ideas on how to merge this triple part-query into one?

BEGIN
SET NOCOUNT ON;
 DECLARE @SalesStats TABLE
 (
 NumberOfProcessed int,
 SellerID int
 )
 DECLARE @1 TABLE(
 NumberOfPaidOrders int, 
 SellerID int
 )
 DECLARE @2 TABLE(
 TotalAmountPaid int, 
 SellerID int
 )
 --Number of Processed orders. Removing 0 as its test account.
 INSERT INTO @SalesStats (NumberOfProcessed, PO.SellerID)
 SELECT COUNT(PO.OrderId) AS NumberOfProcessed, PO.SellerID
 FROM ProcessedORder AS PO
 WHERE NOT PO.SellerID = 0 
 GROUP BY PO.SellerID
 ORDER BY PO.SellerID
 --Number of paid orders
 INSERT INTO @1 (NumberOfPaidOrders, SellerID)
 SELECT COUNT(AA.OrderId) AS NumberOfPaidOrders, AA.SellerID
 FROM AcceptedOrder AS AA
 LEFT JOIN ProcessedORder AS PO ON AA.OrderId = PO.OrderId
 GROUP BY AA.SellerID
 ORDER BY AA.SellerID
 --Total amount paid
 INSERT INTO @2 (TotalAmountPaid, SellerID)
 SELECT SUM(AA.Amount) AS TotalAmountPaid, SellerID
 FROM AcceptedOrder AS AA
 GROUP BY AA.SellerID
 ORDER BY AA.SellerID
 SELECT DISTINCT(SS.SellerID), SS.NumberOfProcessed, one.NumberOfPaidOrders, two.TotalAmountPaid, two.TotalAmountPaid / one.NumberOfPaidOrders AS AverageAmountPaid FROM @SalesStats AS SS
 LEFT JOIN @1 AS one ON SS.SellerID = one.SellerID
 LEFT JOIN @2 AS two ON SS.SellerID = two.SellerID
 ORDER BY SS.NumberOfProcessed DESC
END

Question 2

Table variables

Because you are using them only once, and as part of a bigger query, there is really no benefits in using a table variable nor a temporary table. The best choice in this case is to use a Common Table Expressions, or CTE.

A CTE basically works like a subquery but unlike a traditional subquery, it has the advantage of reading from top to bottom rather than inside out as traditionally done. Here is a sample:

WITH SalesStats AS (
 SELECT PO.SellerID, COUNT(PO.OrderId) AS NumberOfProcessed
 FROM ProcessedOrder AS PO
 WHERE PO.SellerID <> 0
 GROUP BY PO.SellerID
)
SELECT ss.SellerID, ss.NumberOfProcessed
FROM SalesStats AS SS;

Improper use of ORDER BY

Your INSERT INTO queries has a ORDER BY which contributes nothing and in fact may penalize your performance. The only time you should use an ORDER BY is in the final select.

There are legitimate cases where your INSERT INTO (or an UPDATE) may need an ORDER BY but those are the exceptions, not the rule. By putting in an ORDER BY, you are tying up the optimizer's hands because it has less choices it can make in optimizing the queries. Don't do that.

Lack of schema names

You do not qualify your tables. It's very good practice to always use two-naming as the default --- schema.table rather than just table. Consider that multiple schemas within a database could have a table with same name, that a user could be in their own schema (especially from older database where things worked differently back then), that you are forcing the server to do a search of all schema to bind the tables. I'm going to assume that your tables are in the default dbo schema. Therefore....

SELECT PO.SellerID, COUNT(PO.OrderId) AS NumberOfProcessed
FROM dbo.ProcessedOrder AS PO
WHERE PO.SellerID <> 0
GROUP BY PO.SellerID;

Note that a CTE do not have two-part naming because it is defined in the scope of query. But all real objects (e.g. tables or views or stored procedures) should all be named with schema and object names explicitly.

Column ordering

To be honest, I found it confusing to see aggregates listed first then the grouping columns last. Normally, I would write query vertically, with grouping columns first.

SELECT 
 PO.SellerID, 
 COUNT(PO.OrderId) AS NumberOfProcessed
FROM dbo.ProcessedOrder AS PO
WHERE PO.SellerID <> 0
GROUP BY PO.SellerID;

In a small query with only few columns it might be OK to have them as one line but for more complicated queries, vertical listing makes it much easier to see what is going out from that query. By listing the grouping columns, it helps to verify the GROUP BY as well.

Unnecessary use of DISTINCT

In all queries up to the query, you group on SellerID. Thus they are already distinct because by definition, grouping generate one row per value in that column, so it's already distinct. Don't force the optimizer to do more work than necessary.

Lack of semicolons

Upon a time, terminating a SQL statement was optional. Now, it's very much recommended that you always terminate a SQL statement. Some people cheat by adding semicolons only when they are required (e.g. before a CTE or a MERGE statement) but that makes for inconsistent and ugly code.

Furthermore, having an explicit terminator help to clearly separate each SQL statement and avoid overlapping due to ambiguity.

Potential division by zero error

You have this expression: two.TotalAmountPaid / one.NumberOfPaidOrders AS AverageAmountPaid

What if the number of paid orders is zero, not NULL? What if you later changed the queries that made it possible to put in zeroes? It's good to always guard against that to make your queries easier to change if you need to do so in future and is as simple as using NULLIF: TotalAmountPaid / NULLIF(NumberOfPaidOrders, 0). This basically substitutes the 0 with a NULL, which causes the results to be NULL without raising any runtime errors.

Putting it all together

With CTEs, we can consolidate this into a single query.

WITH SalesStats AS (
 SELECT
 PO.SellerID,
 COUNT(PO.OrderId) AS NumberOfProcessed, PO.SellerID
 FROM dbo.ProcessedOrder AS PO
 WHERE NOT PO.SellerID = 0 
 GROUP BY PO.SellerID
), CountOfPaidOrders AS (
 SELECT 
 AA.SellerID,
 COUNT(AA.OrderId) AS NumberOfPaidOrders
 FROM dbo.AcceptedOrder AS AA
 LEFT JOIN dbo.ProcessedOrder AS PO 
 ON AA.OrderId = PO.OrderId
 GROUP BY AA.SellerID
), TotalAmountPaid AS (
 SELECT
 AA.SellerID,
 SUM(AA.Amount) AS TotalAmountPaid
 FROM dbo.AcceptedOrder AS AA
 GROUP BY AA.SellerID
)
SELECT 
 SS.SellerID, 
 SS.NumberOfProcessed, 
 C.NumberOfPaidOrders, 
 T.TotalAmountPaid, 
 T.TotalAmountPaid / NULLIF(C.NumberOfPaidOrders, 0) AS AverageAmountPaid 
FROM SalesStats AS SS
LEFT JOIN CountOfPaidOrders AS C 
 ON SS.SellerID = one.SellerID
LEFT JOIN TotalAmountPaid AS T 
 ON SS.SellerID = two.SellerID
ORDER BY SS.NumberOfProcessed DESC;

Now the optimizer can see how they all are used, and thus form a better execution plan and you don't need to worry about optimizing, beyond ensuring that the optimizer has the indices it needs.

The CTEs only live for a single SQL statement so if you had to modify the procedure that you end up accessing the CountOfPaidOrders (for instance) multiple times within a stored procedure, you do need a temporary table instead, but in this case, I don't see benefits of creating one.

Question 3

Further simplifying @this answer.

Assuming there's a 1:1 relation between order ids the CountOfPaidOrders cte doesn't need the Left Join as you don't access any column from the inner table.
Both CountOfPaidOrders and TotalAmountPaid can be combined into a single Select
No need to calculate an average manually instead of using AVG
No need for using an alias in a single table Select

Now it's down to:

WITH SalesStats AS (
 SELECT
 SellerID,
 Count(OrderId) AS NumberOfProcessed, SellerID
 FROM dbo.ProcessedOrder
 WHERE NOT SellerID = 0 
 GROUP BY SellerID
), PaidOrders AS (
 SELECT
 SellerID,
 Count(OrderId) AS NumberOfPaidOrders
 Sum(Amount) AS TotalAmountPaid
 Avg(Amount) AS AverageAmountPaid
 FROM dbo.AcceptedOrder
 GROUP BY SellerID
)
SELECT 
 ss.SellerID, 
 ss.NumberOfProcessed, 
 paid.NumberOfPaidOrders, 
 paid.TotalAmountPaid, 
 paid.AverageAmountPaid 
FROM SalesStats AS ss
LEFT JOIN PaidOrders AS paid
 ON ss.SellerID = paid.SellerID
ORDER BY ss.NumberOfProcessed DESC;

Question 4

Nice. I do disagree with the #4 mainly because this makes refactoring SQL harder when it becomes necessary to convert a single-table query into multiple-table query and also because this avoid ambiguity with columns that uses keywords (e.g. Description or User are examples where the names are legal without bracketing but are special also). Qualifying eliminates that ambiguity. Thus, it's good habit overall to always alias even for single-table query.

this this 2,03914 silver badges24 bronze badges · Accepted Answer · 2017-12-05 12:02:30Z

Table variables

Because you are using them only once, and as part of a bigger query, there is really no benefits in using a table variable nor a temporary table. The best choice in this case is to use a Common Table Expressions, or CTE.

A CTE basically works like a subquery but unlike a traditional subquery, it has the advantage of reading from top to bottom rather than inside out as traditionally done. Here is a sample:

WITH SalesStats AS (
 SELECT PO.SellerID, COUNT(PO.OrderId) AS NumberOfProcessed
 FROM ProcessedOrder AS PO
 WHERE PO.SellerID <> 0
 GROUP BY PO.SellerID
)
SELECT ss.SellerID, ss.NumberOfProcessed
FROM SalesStats AS SS;

Improper use of ORDER BY

Your INSERT INTO queries has a ORDER BY which contributes nothing and in fact may penalize your performance. The only time you should use an ORDER BY is in the final select.

There are legitimate cases where your INSERT INTO (or an UPDATE) may need an ORDER BY but those are the exceptions, not the rule. By putting in an ORDER BY, you are tying up the optimizer's hands because it has less choices it can make in optimizing the queries. Don't do that.

Lack of schema names

You do not qualify your tables. It's very good practice to always use two-naming as the default --- schema.table rather than just table. Consider that multiple schemas within a database could have a table with same name, that a user could be in their own schema (especially from older database where things worked differently back then), that you are forcing the server to do a search of all schema to bind the tables. I'm going to assume that your tables are in the default dbo schema. Therefore....

SELECT PO.SellerID, COUNT(PO.OrderId) AS NumberOfProcessed
FROM dbo.ProcessedOrder AS PO
WHERE PO.SellerID <> 0
GROUP BY PO.SellerID;

Note that a CTE do not have two-part naming because it is defined in the scope of query. But all real objects (e.g. tables or views or stored procedures) should all be named with schema and object names explicitly.

Column ordering

To be honest, I found it confusing to see aggregates listed first then the grouping columns last. Normally, I would write query vertically, with grouping columns first.

SELECT 
 PO.SellerID, 
 COUNT(PO.OrderId) AS NumberOfProcessed
FROM dbo.ProcessedOrder AS PO
WHERE PO.SellerID <> 0
GROUP BY PO.SellerID;

In a small query with only few columns it might be OK to have them as one line but for more complicated queries, vertical listing makes it much easier to see what is going out from that query. By listing the grouping columns, it helps to verify the GROUP BY as well.

Unnecessary use of DISTINCT

In all queries up to the query, you group on SellerID. Thus they are already distinct because by definition, grouping generate one row per value in that column, so it's already distinct. Don't force the optimizer to do more work than necessary.

Lack of semicolons

Upon a time, terminating a SQL statement was optional. Now, it's very much recommended that you always terminate a SQL statement. Some people cheat by adding semicolons only when they are required (e.g. before a CTE or a MERGE statement) but that makes for inconsistent and ugly code.

Furthermore, having an explicit terminator help to clearly separate each SQL statement and avoid overlapping due to ambiguity.

Potential division by zero error

You have this expression: two.TotalAmountPaid / one.NumberOfPaidOrders AS AverageAmountPaid

What if the number of paid orders is zero, not NULL? What if you later changed the queries that made it possible to put in zeroes? It's good to always guard against that to make your queries easier to change if you need to do so in future and is as simple as using NULLIF: TotalAmountPaid / NULLIF(NumberOfPaidOrders, 0). This basically substitutes the 0 with a NULL, which causes the results to be NULL without raising any runtime errors.

Putting it all together

With CTEs, we can consolidate this into a single query.

WITH SalesStats AS (
 SELECT
 PO.SellerID,
 COUNT(PO.OrderId) AS NumberOfProcessed, PO.SellerID
 FROM dbo.ProcessedOrder AS PO
 WHERE NOT PO.SellerID = 0 
 GROUP BY PO.SellerID
), CountOfPaidOrders AS (
 SELECT 
 AA.SellerID,
 COUNT(AA.OrderId) AS NumberOfPaidOrders
 FROM dbo.AcceptedOrder AS AA
 LEFT JOIN dbo.ProcessedOrder AS PO 
 ON AA.OrderId = PO.OrderId
 GROUP BY AA.SellerID
), TotalAmountPaid AS (
 SELECT
 AA.SellerID,
 SUM(AA.Amount) AS TotalAmountPaid
 FROM dbo.AcceptedOrder AS AA
 GROUP BY AA.SellerID
)
SELECT 
 SS.SellerID, 
 SS.NumberOfProcessed, 
 C.NumberOfPaidOrders, 
 T.TotalAmountPaid, 
 T.TotalAmountPaid / NULLIF(C.NumberOfPaidOrders, 0) AS AverageAmountPaid 
FROM SalesStats AS SS
LEFT JOIN CountOfPaidOrders AS C 
 ON SS.SellerID = one.SellerID
LEFT JOIN TotalAmountPaid AS T 
 ON SS.SellerID = two.SellerID
ORDER BY SS.NumberOfProcessed DESC;

Now the optimizer can see how they all are used, and thus form a better execution plan and you don't need to worry about optimizing, beyond ensuring that the optimizer has the indices it needs.

The CTEs only live for a single SQL statement so if you had to modify the procedure that you end up accessing the CountOfPaidOrders (for instance) multiple times within a stored procedure, you do need a temporary table instead, but in this case, I don't see benefits of creating one.

Stack Exchange Network

Stored procedure for processed orders in SQL Server

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Stored procedure for processed orders in SQL Server

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions