I am struggling with creating a Stored Procedure in SQL Server for getting sales statistics. I have managed to get the information I need out, but the query is a mess and I have probably messed up the table variable by using 3 different ones.
Any ideas on how to merge this triple part-query into one?
BEGIN
SET NOCOUNT ON;
DECLARE @SalesStats TABLE
(
NumberOfProcessed int,
SellerID int
)
DECLARE @1 TABLE(
NumberOfPaidOrders int,
SellerID int
)
DECLARE @2 TABLE(
TotalAmountPaid int,
SellerID int
)
--Number of Processed orders. Removing 0 as its test account.
INSERT INTO @SalesStats (NumberOfProcessed, PO.SellerID)
SELECT COUNT(PO.OrderId) AS NumberOfProcessed, PO.SellerID
FROM ProcessedORder AS PO
WHERE NOT PO.SellerID = 0
GROUP BY PO.SellerID
ORDER BY PO.SellerID
--Number of paid orders
INSERT INTO @1 (NumberOfPaidOrders, SellerID)
SELECT COUNT(AA.OrderId) AS NumberOfPaidOrders, AA.SellerID
FROM AcceptedOrder AS AA
LEFT JOIN ProcessedORder AS PO ON AA.OrderId = PO.OrderId
GROUP BY AA.SellerID
ORDER BY AA.SellerID
--Total amount paid
INSERT INTO @2 (TotalAmountPaid, SellerID)
SELECT SUM(AA.Amount) AS TotalAmountPaid, SellerID
FROM AcceptedOrder AS AA
GROUP BY AA.SellerID
ORDER BY AA.SellerID
SELECT DISTINCT(SS.SellerID), SS.NumberOfProcessed, one.NumberOfPaidOrders, two.TotalAmountPaid, two.TotalAmountPaid / one.NumberOfPaidOrders AS AverageAmountPaid FROM @SalesStats AS SS
LEFT JOIN @1 AS one ON SS.SellerID = one.SellerID
LEFT JOIN @2 AS two ON SS.SellerID = two.SellerID
ORDER BY SS.NumberOfProcessed DESC
END
2 Answers 2
Table variables
Because you are using them only once, and as part of a bigger query, there is really no benefits in using a table variable nor a temporary table. The best choice in this case is to use a Common Table Expressions
, or CTE
.
A CTE basically works like a subquery but unlike a traditional subquery, it has the advantage of reading from top to bottom rather than inside out as traditionally done. Here is a sample:
WITH SalesStats AS (
SELECT PO.SellerID, COUNT(PO.OrderId) AS NumberOfProcessed
FROM ProcessedOrder AS PO
WHERE PO.SellerID <> 0
GROUP BY PO.SellerID
)
SELECT ss.SellerID, ss.NumberOfProcessed
FROM SalesStats AS SS;
Improper use of ORDER BY
Your INSERT INTO
queries has a ORDER BY
which contributes nothing and in fact may penalize your performance. The only time you should use an ORDER BY
is in the final select.
There are legitimate cases where your INSERT INTO
(or an UPDATE
) may need an ORDER BY
but those are the exceptions, not the rule. By putting in an ORDER BY
, you are tying up the optimizer's hands because it has less choices it can make in optimizing the queries. Don't do that.
Lack of schema names
You do not qualify your tables. It's very good practice to always use two-naming as the default --- schema.table
rather than just table
. Consider that multiple schemas within a database could have a table with same name, that a user could be in their own schema (especially from older database where things worked differently back then), that you are forcing the server to do a search of all schema to bind the tables. I'm going to assume that your tables are in the default dbo
schema. Therefore....
SELECT PO.SellerID, COUNT(PO.OrderId) AS NumberOfProcessed
FROM dbo.ProcessedOrder AS PO
WHERE PO.SellerID <> 0
GROUP BY PO.SellerID;
Note that a CTE do not have two-part naming because it is defined in the scope of query. But all real objects (e.g. tables or views or stored procedures) should all be named with schema and object names explicitly.
Column ordering
To be honest, I found it confusing to see aggregates listed first then the grouping columns last. Normally, I would write query vertically, with grouping columns first.
SELECT
PO.SellerID,
COUNT(PO.OrderId) AS NumberOfProcessed
FROM dbo.ProcessedOrder AS PO
WHERE PO.SellerID <> 0
GROUP BY PO.SellerID;
In a small query with only few columns it might be OK to have them as one line but for more complicated queries, vertical listing makes it much easier to see what is going out from that query. By listing the grouping columns, it helps to verify the GROUP BY
as well.
Unnecessary use of DISTINCT
In all queries up to the query, you group on SellerID
. Thus they are already distinct because by definition, grouping generate one row per value in that column, so it's already distinct. Don't force the optimizer to do more work than necessary.
Lack of semicolons
Upon a time, terminating a SQL statement was optional. Now, it's very much recommended that you always terminate a SQL statement. Some people cheat by adding semicolons only when they are required (e.g. before a CTE or a MERGE
statement) but that makes for inconsistent and ugly code.
Furthermore, having an explicit terminator help to clearly separate each SQL statement and avoid overlapping due to ambiguity.
Potential division by zero error
You have this expression: two.TotalAmountPaid / one.NumberOfPaidOrders AS AverageAmountPaid
What if the number of paid orders is zero, not NULL
? What if you later changed the queries that made it possible to put in zeroes? It's good to always guard against that to make your queries easier to change if you need to do so in future and is as simple as using NULLIF
: TotalAmountPaid / NULLIF(NumberOfPaidOrders, 0)
. This basically substitutes the 0
with a NULL
, which causes the results to be NULL
without raising any runtime errors.
Putting it all together
With CTEs, we can consolidate this into a single query.
WITH SalesStats AS (
SELECT
PO.SellerID,
COUNT(PO.OrderId) AS NumberOfProcessed, PO.SellerID
FROM dbo.ProcessedOrder AS PO
WHERE NOT PO.SellerID = 0
GROUP BY PO.SellerID
), CountOfPaidOrders AS (
SELECT
AA.SellerID,
COUNT(AA.OrderId) AS NumberOfPaidOrders
FROM dbo.AcceptedOrder AS AA
LEFT JOIN dbo.ProcessedOrder AS PO
ON AA.OrderId = PO.OrderId
GROUP BY AA.SellerID
), TotalAmountPaid AS (
SELECT
AA.SellerID,
SUM(AA.Amount) AS TotalAmountPaid
FROM dbo.AcceptedOrder AS AA
GROUP BY AA.SellerID
)
SELECT
SS.SellerID,
SS.NumberOfProcessed,
C.NumberOfPaidOrders,
T.TotalAmountPaid,
T.TotalAmountPaid / NULLIF(C.NumberOfPaidOrders, 0) AS AverageAmountPaid
FROM SalesStats AS SS
LEFT JOIN CountOfPaidOrders AS C
ON SS.SellerID = one.SellerID
LEFT JOIN TotalAmountPaid AS T
ON SS.SellerID = two.SellerID
ORDER BY SS.NumberOfProcessed DESC;
Now the optimizer can see how they all are used, and thus form a better execution plan and you don't need to worry about optimizing, beyond ensuring that the optimizer has the indices it needs.
The CTEs only live for a single SQL statement so if you had to modify the procedure that you end up accessing the CountOfPaidOrders (for instance) multiple times within a stored procedure, you do need a temporary table instead, but in this case, I don't see benefits of creating one.
Further simplifying @this answer.
- Assuming there's a 1:1 relation between order ids the
CountOfPaidOrders
cte doesn't need the Left Join as you don't access any column from the inner table. - Both
CountOfPaidOrders
andTotalAmountPaid
can be combined into a single Select - No need to calculate an average manually instead of using AVG
- No need for using an alias in a single table Select
Now it's down to:
WITH SalesStats AS (
SELECT
SellerID,
Count(OrderId) AS NumberOfProcessed, SellerID
FROM dbo.ProcessedOrder
WHERE NOT SellerID = 0
GROUP BY SellerID
), PaidOrders AS (
SELECT
SellerID,
Count(OrderId) AS NumberOfPaidOrders
Sum(Amount) AS TotalAmountPaid
Avg(Amount) AS AverageAmountPaid
FROM dbo.AcceptedOrder
GROUP BY SellerID
)
SELECT
ss.SellerID,
ss.NumberOfProcessed,
paid.NumberOfPaidOrders,
paid.TotalAmountPaid,
paid.AverageAmountPaid
FROM SalesStats AS ss
LEFT JOIN PaidOrders AS paid
ON ss.SellerID = paid.SellerID
ORDER BY ss.NumberOfProcessed DESC;
-
\$\begingroup\$ Nice. I do disagree with the #4 mainly because this makes refactoring SQL harder when it becomes necessary to convert a single-table query into multiple-table query and also because this avoid ambiguity with columns that uses keywords (e.g.
Description
orUser
are examples where the names are legal without bracketing but are special also). Qualifying eliminates that ambiguity. Thus, it's good habit overall to always alias even for single-table query. \$\endgroup\$this– this2017年12月15日 02:55:36 +00:00Commented Dec 15, 2017 at 2:55
Explore related questions
See similar questions with these tags.