Set non-negative floor for rolling sum, in SQL Server

Question 1

I need to set a floor on the rolling sum calculation. For example, with

PKID NumValue GroupID
----------------------------
1 -1 1
2 -2 1
3 5 1
4 -7 1
5 1 2

I would like to have:

PKID RollingSum GroupID
----------------------------- ## Explanation: 
1 0 1 ## 0 - 1 < 0 => 0
2 0 1 ## 0 - 2 < 0 => 0
3 5 1 ## 0 + 5 > 0 => 5
4 0 1 ## 5 - 7 < 0 => 0

When adding a negative number will cause the sum to be negative, the limit will be activated to set the result as zero. Subsequent addition should be based on this adjusted value, instead of the original rolling sum.

The expected result should be achieved using addition. If the fourth number changes from -7 to -3, the fourth result should be 2 instead of 0

If a single sum can be provided rather than a few rolling numbers, it would also be acceptable. I can use stored procedures to implement a non-negative addition, but that would be too low-level.

The real life problem for this is that we record order placed as positive amount and cancelled as negative. Due to connectivity issues customers may click the cancel button more than once, which will result in multiple negative values being recorded. When calculating our revenue, "zero" need to be a boundary for sales.

This business application is absolutely stupid, but there's nothing I can do about that. For this question, please only consider solution that can be used by a DBA.

I expect fifty rows per GroupID at most.

Question 2

For purely friendly comparison. I was able to accomplish this in PostgreSQL without using recursion. I also upvoted this question as it is an abnormally fun and clever SQL problem.

Question 3

Here's a recursive CTE example I came up with (which seems to work). Is uses Row_Number() OVER to create a sequence number with no gaps. I don't know how well it would perform with your data, but it's something to try.

--Set up demo data
IF OBJECT_ID('tempdb..#temp') IS NOT NULL
 drop table #temp
go
create table #temp (PKID int, NumValue int, GroupID int)
insert into #temp values
(1,-1,1), (3,-2,1), (5,5,1), (7,-3,1), (9,1,2)
--here is the real code
;
with RowNumberAddedToTemp as 
(
SELECT 
 ROW_NUMBER() OVER(ORDER BY PKID ASC) AS rn,
 * from #temp
 )
,x
AS (
 SELECT PKID --Anchor row
 ,NumValue
 ,RunningTotal = CASE 
 WHEN NumValue < 0 --if initial value less than zero, make zero
 THEN 0
 ELSE NumValue
 END
 ,GroupID
 ,rn
 FROM RowNumberAddedToTemp
 WHERE rn = 1 
 UNION ALL
 SELECT y.PKID
 ,y.NumValue
 ,CASE 
 WHEN x.GroupID <> y.groupid --did GroupId change?
 THEN CASE 
 WHEN y.NumValue < 0 --if value is less than zero, make zero
 THEN 0
 ELSE y.numvalue --start new groupid totals
 END
 WHEN x.RunningTotal + y.NumValue < 0 --If adding the current row makes the total < 0, make zero
 THEN 0
 ELSE x.RunningTotal + y.NumValue --Add to the running total for the current groupid
 END
 ,y.Groupid
 ,y.rn
 FROM x
 INNER JOIN RowNumberAddedToTemp AS y ON y.rn = x.rn + 1
 )
SELECT PKID
 ,Numvalue
 ,RunningTotal
 ,GroupID
FROM x
ORDER BY PKID
OPTION (MAXRECURSION 10000);

Question 4

Below is a recursive solution similar to Scott Hodgin's answer, but it should be able to better take advantage of an index. That will improve performance depending on what your data looks like. I mocked up one million rows of data with 20000 groups. Each has 50 rows associated with it:

DROP TABLE IF EXISTS biz_application_problems;
CREATE TABLE dbo.biz_application_problems (
 PKID BIGINT NOT NULL,
 NumValue INT NOT NULL,
 GroupID INT NOT NULL,
 PRIMARY KEY (PKID)
);
INSERT INTO dbo.biz_application_problems WITH (TABLOCK)
SELECT TOP (1000000) 
 2 * ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
, CAST(CRYPT_GEN_RANDOM(1) AS INT) % 22 - 11
, 1 + (-1 + ROW_NUMBER() OVER (ORDER BY (SELECT NULL))) / 50
FROM master..spt_values t1
CROSS JOIN master..spt_values t2
OPTION (MAXDOP 1);
CREATE INDEX gotta_go_fast ON dbo.biz_application_problems (GroupID, PKID)
 INCLUDE (NumValue);

The nonclustered index that I added to the table should allow the recursive part of the CTE to get the next row with just one index seek. It also means that the order of data in the table according to the PK doesn't matter. The other trick here is to use ROW_NUMBER() to efficiently get the next row. This is necessary because TOP cannot be used in the recursive part of a CTE. Here's the query:

WITH rec_cte AS (
 SELECT TOP 1
 GroupID
 , PKID
 , CASE WHEN NumValue < 0 THEN 0 ELSE NumValue END RunningSum
 , NumValue -- for validation purposes only
 FROM dbo.biz_application_problems
 ORDER BY GroupId, PKID
 UNION ALL
 SELECT t.GroupID
 , t.PKID
 , t.RunningSum
 , t.NumValue
 FROM 
 (
 SELECT 
 b.GroupID
 , b.PKID
 , CASE WHEN ca.RunningSum < 0 THEN 0 ELSE ca.RunningSum END RunningSum
 , b.NumValue
 , ROW_NUMBER() OVER (ORDER BY b.GroupId, b.PKID) rn
 FROM dbo.biz_application_problems b
 INNER JOIN rec_cte r ON
 (b.GroupID = r.GroupID AND b.PKID > r.PKID) OR b.GroupID > r.GroupID
 CROSS APPLY (
 SELECT CASE WHEN b.GroupID <> r.GroupID THEN 0 ELSE r.RunningSum END + b.NumValue
 ) ca (RunningSum)
 ) t
 WHERE t.rn = 1
)
SELECT *
FROM rec_cte
OPTION (MAXRECURSION 0);

Here's a sample of the results:

╔═════════╦══════╦════════════╦══════════╗
║ GroupID ║ PKID ║ RunningSum ║ NumValue ║
╠═════════╬══════╬════════════╬══════════╣
║ 7 ║ 700 ║ 13 ║ -4 ║
║ 8 ║ 702 ║ 0 ║ -2 ║
║ 8 ║ 704 ║ 7 ║ 7 ║
║ 8 ║ 706 ║ 8 ║ 1 ║
║ 8 ║ 708 ║ 3 ║ -5 ║
║ 8 ║ 710 ║ 0 ║ -4 ║
║ 8 ║ 712 ║ 0 ║ -7 ║
║ 8 ║ 714 ║ 7 ║ 7 ║
║ 8 ║ 716 ║ 2 ║ -5 ║
║ 8 ║ 718 ║ 10 ║ 8 ║
║ 8 ║ 720 ║ 0 ║ -11 ║
║ 8 ║ 722 ║ 0 ║ -7 ║
╚═════════╩══════╩════════════╩══════════╝

On my machine, the code takes around 10 seconds to process one million rows. We can see index seeks that return just one row for each iteration:

seeks

I uploaded the actual plan here as well.

score 3 · Accepted Answer · 2017-07-06 13:32:27Z

Here's a recursive CTE example I came up with (which seems to work). Is uses Row_Number() OVER to create a sequence number with no gaps. I don't know how well it would perform with your data, but it's something to try.

--Set up demo data
IF OBJECT_ID('tempdb..#temp') IS NOT NULL
 drop table #temp
go
create table #temp (PKID int, NumValue int, GroupID int)
insert into #temp values
(1,-1,1), (3,-2,1), (5,5,1), (7,-3,1), (9,1,2)
--here is the real code
;
with RowNumberAddedToTemp as 
(
SELECT 
 ROW_NUMBER() OVER(ORDER BY PKID ASC) AS rn,
 * from #temp
 )
,x
AS (
 SELECT PKID --Anchor row
 ,NumValue
 ,RunningTotal = CASE 
 WHEN NumValue < 0 --if initial value less than zero, make zero
 THEN 0
 ELSE NumValue
 END
 ,GroupID
 ,rn
 FROM RowNumberAddedToTemp
 WHERE rn = 1 
 UNION ALL
 SELECT y.PKID
 ,y.NumValue
 ,CASE 
 WHEN x.GroupID <> y.groupid --did GroupId change?
 THEN CASE 
 WHEN y.NumValue < 0 --if value is less than zero, make zero
 THEN 0
 ELSE y.numvalue --start new groupid totals
 END
 WHEN x.RunningTotal + y.NumValue < 0 --If adding the current row makes the total < 0, make zero
 THEN 0
 ELSE x.RunningTotal + y.NumValue --Add to the running total for the current groupid
 END
 ,y.Groupid
 ,y.rn
 FROM x
 INNER JOIN RowNumberAddedToTemp AS y ON y.rn = x.rn + 1
 )
SELECT PKID
 ,Numvalue
 ,RunningTotal
 ,GroupID
FROM x
ORDER BY PKID
OPTION (MAXRECURSION 10000);

Stack Exchange Network

Set non-negative floor for rolling sum, in SQL Server

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Set non-negative floor for rolling sum, in SQL Server

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions