11

Am trying to calculate running total. But it should reset when the cummulative sum greater than another column value

create table #reset_runn_total
(
id int identity(1,1),
val int, 
reset_val int,
grp int
)
insert into #reset_runn_total
values 
(1,10,1),
(8,12,1),(6,14,1),(5,10,1),(6,13,1),(3,11,1),(9,8,1),(10,12,1)
SELECT Row_number()OVER(partition BY grp ORDER BY id)AS rn,*
INTO #test
FROM #reset_runn_total

Index details:

CREATE UNIQUE CLUSTERED INDEX ix_load_reset_runn_total
 ON #test(rn, grp) 

sample data

+----+-----+-----------+-----+
| id | val | reset_val | Grp |
+----+-----+-----------+-----+
| 1 | 1 | 10 | 1 |
| 2 | 8 | 12 | 1 |
| 3 | 6 | 14 | 1 |
| 4 | 5 | 10 | 1 |
| 5 | 6 | 13 | 1 |
| 6 | 3 | 11 | 1 |
| 7 | 9 | 8 | 1 |
| 8 | 10 | 12 | 1 |
+----+-----+-----------+-----+ 

Expected result

+----+-----+-----------------+-------------+
| id | val | reset_val | Running_tot |
+----+-----+-----------------+-------------+
| 1 | 1 | 10 | 1 | 
| 2 | 8 | 12 | 9 | --1+8
| 3 | 6 | 14 | 15 | --1+8+6 -- greater than reset val
| 4 | 5 | 10 | 5 | --reset 
| 5 | 6 | 13 | 11 | --5+6
| 6 | 3 | 11 | 14 | --5+6+3 -- greater than reset val
| 7 | 9 | 8 | 9 | --reset -- greater than reset val 
| 8 | 10 | 12 | 10 | --reset
+----+-----+-----------------+-------------+

Query:

I got the result using Recursive CTE. Original question is here https://stackoverflow.com/questions/42085404/reset-running-total-based-on-another-column

;WITH cte
 AS (SELECT rn,id,
 val,
 reset_val,
 grp,
 val AS running_total,
 Iif (val > reset_val, 1, 0) AS flag
 FROM #test
 WHERE rn = 1
 UNION ALL
 SELECT r.*,
 Iif(c.flag = 1, r.val, c.running_total + r.val),
 Iif(Iif(c.flag = 1, r.val, c.running_total + r.val) > r.reset_val, 1, 0)
 FROM cte c
 JOIN #test r
 ON r.grp = c.grp
 AND r.rn = c.rn + 1)
SELECT *
FROM cte 

Is there any better alternative in T-SQL without using CLR.?

asked Feb 8, 2017 at 8:55
4
  • Better how? Does this query exhibit poor performance? Using what metrics? Commented Feb 8, 2017 at 13:54
  • @AaronBertrand - For better understanding I have posted sample data for just one group. I have to do the same for around 50000 groups with 60 Id's. so total count of records will be around 3000000. Am sure Recursive CTE will not scale well on 3000000. Will update the metrics when I get back to office. Can we achieve this using sum()Over(Order by) like you have used in this article sqlperformance.com/2012/07/t-sql-queries/running-totals Commented Feb 8, 2017 at 15:15
  • A cursor might do better than a recursive CTE Commented Feb 8, 2017 at 18:34
  • 2
    FYI .. still Active Connect Item - Add RESET WHEN clause like in Teradata to restart window partition in window functions - by Itzik Ben-Gan Commented Feb 10, 2017 at 19:50

4 Answers 4

6
+100

I've looked at similar problems and have never been able to find a window function solution that does a single pass over the data. I don't think it's possible. Window functions need to be able to be applied to all of the values in a column. That makes reset calculations like this very difficult, because one reset changes the value for all of the following values.

One way to think about the problem is that you can get the end result you want if you calculate a basic running total as long as you can subtract the running total from the correct previous row. For example, in your sample data the value for id 4 is the running total of row 4 - the running total of row 3. The value for id 6 is the running total of row 6 - the running total of row 3 because a reset hasn't happened yet. The value for id 7 is the running total of row 7 - the running total of row 6 and so on.

I would approach this with T-SQL in a loop. I got a little carried away and think I have a full solution. For 3 million rows and 500 groups the code finished in 24 seconds on my desktop. I'm testing with SQL Server 2016 Developer edition with 6 vCPU. I'm taking advantage of parallel inserts and parallel execution in general so you may need to change the code if you're on an older version or have DOP limitations.

Below the code that I used to generate the data. The ranges on VAL and RESET_VAL should be similar to your sample data.

drop table if exists reset_runn_total;
create table reset_runn_total
(
id int identity(1,1),
val int, 
reset_val int,
grp int
);
DECLARE 
@group_num INT,
@row_num INT;
BEGIN
 SET NOCOUNT ON;
 BEGIN TRANSACTION;
 SET @group_num = 1;
 WHILE @group_num <= 50000 
 BEGIN
 SET @row_num = 1;
 WHILE @row_num <= 60
 BEGIN
 INSERT INTO reset_runn_total WITH (TABLOCK)
 SELECT 1 + ABS(CHECKSUM(NewId())) % 10, 8 + ABS(CHECKSUM(NewId())) % 8, @group_num;
 SET @row_num = @row_num + 1;
 END;
 SET @group_num = @group_num + 1;
 END;
 COMMIT TRANSACTION;
END;

The algorithm is as follows:

1) Start by inserting all rows with a standard running total into a temp table.

2) In a loop:

2a) For each group, calculate the first row with a running total above the reset_value remaining in the table and store the id, the running total that was too large, and the previous running total that was too large in a temp table.

2b) Delete rows from the first temp table into a results temp table that have an ID less than or equal to the ID in the second temp table. Use the other columns to adjust the running total as needed.

3) After the delete no longer processes rows run an additional DELETE OUTPUT into the results table. This is for rows at the end of the group that never exceed the reset value.

I'll go through one implementation of the above algorithm in T-SQL step by step.

Start by creating a few temp tables. #initial_results holds the original data with the standard running total, #group_bookkeeping is updated each loop to figure out which rows can be moved, and #final_results contains the results with the running total adjusted for resets.

CREATE TABLE #initial_results (
id int,
val int, 
reset_val int,
grp int,
initial_running_total int
);
CREATE TABLE #group_bookkeeping (
grp int,
max_id_to_move int,
running_total_to_subtract_this_loop int,
running_total_to_subtract_next_loop int,
grp_done bit, 
PRIMARY KEY (grp)
);
CREATE TABLE #final_results (
id int,
val int, 
reset_val int,
grp int,
running_total int
);
INSERT INTO #initial_results WITH (TABLOCK)
SELECT ID, VAL, RESET_VAL, GRP, SUM(VAL) OVER (PARTITION BY GRP ORDER BY ID) RUNNING_TOTAL
FROM reset_runn_total;
CREATE CLUSTERED INDEX i1 ON #initial_results (grp, id);
INSERT INTO #group_bookkeeping WITH (TABLOCK)
SELECT DISTINCT GRP, 0, 0, 0, 0
FROM reset_runn_total;

I create the clustered index on the temp table after so the insert and the index build can be done in parallel. Made a big difference on my machine but may not on yours. Creating an index on the source table didn't seem to help but that could help on your machine.

The code below runs in the loop and updates the bookkeeping table. For each group, we need to get the find the maximum ID which should be moved into the results table. We need the running total from that row so we can subtract it from the initial running total. The grp_done column is set to 1 when there isn't any more work to do for a grp.

WITH UPD_CTE AS (
 SELECT 
 #grp_bookkeeping.GRP
 , MIN(CASE WHEN initial_running_total - #group_bookkeeping.running_total_to_subtract_next_loop > RESET_VAL THEN ID ELSE NULL END) max_id_to_update
 , MIN(#group_bookkeeping.running_total_to_subtract_next_loop) running_total_to_subtract_this_loop
 , MIN(CASE WHEN initial_running_total - #group_bookkeeping.running_total_to_subtract_next_loop > RESET_VAL THEN initial_running_total ELSE NULL END) additional_value_next_loop
 , CASE WHEN MIN(CASE WHEN initial_running_total - #group_bookkeeping.running_total_to_subtract_next_loop > RESET_VAL THEN ID ELSE NULL END) IS NULL THEN 1 ELSE 0 END grp_done
 FROM #group_bookkeeping 
 INNER JOIN #initial_results IR ON #group_bookkeeping.grp = ir.grp
 WHERE #group_bookkeeping.grp_done = 0
 GROUP BY #group_bookkeeping.GRP
 )
 UPDATE #group_bookkeeping
 SET #group_bookkeeping.max_id_to_move = uv.max_id_to_update
 , #group_bookkeeping.running_total_to_subtract_this_loop = uv.running_total_to_subtract_this_loop
 , #group_bookkeeping.running_total_to_subtract_next_loop = uv.additional_value_next_loop
 , #group_bookkeeping.grp_done = uv.grp_done
 FROM UPD_CTE uv
 WHERE uv.GRP = #group_bookkeeping.grp
OPTION (LOOP JOIN);

Really not a fan of the LOOP JOIN hint in general, but this is a simple query and it was the quickest way to get what I wanted. To really optimize for response time I wanted parallel nested loop joins instead of DOP 1 merge joins.

The code below runs in the loop and moves data from the initial table into the final results table. Notice the adjustment to the initial running total.

DELETE ir
OUTPUT DELETED.id, 
 DELETED.VAL, 
 DELETED.RESET_VAL, 
 DELETED.GRP ,
 DELETED.initial_running_total - tb.running_total_to_subtract_this_loop
INTO #final_results
FROM #initial_results ir
INNER JOIN #group_bookkeeping tb ON ir.GRP = tb.GRP AND ir.ID <= tb.max_id_to_move
WHERE tb.grp_done = 0;

For your convenience below is the full code:

DECLARE @RC INT;
BEGIN
SET NOCOUNT ON;
CREATE TABLE #initial_results (
id int,
val int, 
reset_val int,
grp int,
initial_running_total int
);
CREATE TABLE #group_bookkeeping (
grp int,
max_id_to_move int,
running_total_to_subtract_this_loop int,
running_total_to_subtract_next_loop int,
grp_done bit, 
PRIMARY KEY (grp)
);
CREATE TABLE #final_results (
id int,
val int, 
reset_val int,
grp int,
running_total int
);
INSERT INTO #initial_results WITH (TABLOCK)
SELECT ID, VAL, RESET_VAL, GRP, SUM(VAL) OVER (PARTITION BY GRP ORDER BY ID) RUNNING_TOTAL
FROM reset_runn_total;
CREATE CLUSTERED INDEX i1 ON #initial_results (grp, id);
INSERT INTO #group_bookkeeping WITH (TABLOCK)
SELECT DISTINCT GRP, 0, 0, 0, 0
FROM reset_runn_total;
SET @RC = 1;
WHILE @RC > 0 
BEGIN
 WITH UPD_CTE AS (
 SELECT 
 #group_bookkeeping.GRP
 , MIN(CASE WHEN initial_running_total - #group_bookkeeping.running_total_to_subtract_next_loop > RESET_VAL THEN ID ELSE NULL END) max_id_to_move
 , MIN(#group_bookkeeping.running_total_to_subtract_next_loop) running_total_to_subtract_this_loop
 , MIN(CASE WHEN initial_running_total - #group_bookkeeping.running_total_to_subtract_next_loop > RESET_VAL THEN initial_running_total ELSE NULL END) additional_value_next_loop
 , CASE WHEN MIN(CASE WHEN initial_running_total - #group_bookkeeping.running_total_to_subtract_next_loop > RESET_VAL THEN ID ELSE NULL END) IS NULL THEN 1 ELSE 0 END grp_done
 FROM #group_bookkeeping 
 CROSS APPLY (SELECT ID, RESET_VAL, initial_running_total FROM #initial_results ir WHERE #group_bookkeeping.grp = ir.grp ) ir
 WHERE #group_bookkeeping.grp_done = 0
 GROUP BY #group_bookkeeping.GRP
 )
 UPDATE #group_bookkeeping
 SET #group_bookkeeping.max_id_to_move = uv.max_id_to_move
 , #group_bookkeeping.running_total_to_subtract_this_loop = uv.running_total_to_subtract_this_loop
 , #group_bookkeeping.running_total_to_subtract_next_loop = uv.additional_value_next_loop
 , #group_bookkeeping.grp_done = uv.grp_done
 FROM UPD_CTE uv
 WHERE uv.GRP = #group_bookkeeping.grp
 OPTION (LOOP JOIN);
 DELETE ir
 OUTPUT DELETED.id, 
 DELETED.VAL, 
 DELETED.RESET_VAL, 
 DELETED.GRP ,
 DELETED.initial_running_total - tb.running_total_to_subtract_this_loop
 INTO #final_results
 FROM #initial_results ir
 INNER JOIN #group_bookkeeping tb ON ir.GRP = tb.GRP AND ir.ID <= tb.max_id_to_move
 WHERE tb.grp_done = 0;
 SET @RC = @@ROWCOUNT;
END;
DELETE ir 
OUTPUT DELETED.id, 
 DELETED.VAL, 
 DELETED.RESET_VAL, 
 DELETED.GRP ,
 DELETED.initial_running_total - tb.running_total_to_subtract_this_loop
 INTO #final_results
FROM #initial_results ir
INNER JOIN #group_bookkeeping tb ON ir.GRP = tb.GRP;
CREATE CLUSTERED INDEX f1 ON #final_results (grp, id);
/* -- do something with the data
SELECT *
FROM #final_results
ORDER BY grp, id;
*/
DROP TABLE #final_results;
DROP TABLE #initial_results;
DROP TABLE #group_bookkeeping;
END;
answered Feb 9, 2017 at 5:37
6
  • simply awesome I will award you with bounty Commented Feb 9, 2017 at 10:17
  • In our server, for 50000 grp's and 60 id's yours took 1 minute and 10 seconds. Recursive CTE took 2 minutes and 15 seconds Commented Feb 9, 2017 at 11:00
  • I tested the both the codes with same data. Your's was awesome. Can it be further improved ? Commented Feb 9, 2017 at 14:48
  • I meant, I ran your code on our real data and tested it. Calculation is processed in temp tables in my real procedure, most likely it should be tightly packed. It will be good if it can be reduced to anything around 30 seconds Commented Feb 9, 2017 at 15:20
  • @Prdp Tried a quick approach that used an update but it seemed to be worse. Won't be able to look into this more for a while. Try logging how long each operation takes so you can figure out which part is running the slowest on your server. It's definitely possible that there's a way to speed up this code or a better algorithm in general. Commented Feb 9, 2017 at 15:44
4

Using a CURSOR:

ALTER TABLE #reset_runn_total ADD RunningTotal int;
DECLARE @id int, @val int, @reset int, @acm int, @grp int, @last_grp int;
SET @acm = 0;
DECLARE curRes CURSOR FAST_FORWARD FOR 
SELECT id, val, reset_val, grp
FROM #reset_runn_total
ORDER BY grp, id;
OPEN curRes;
FETCH NEXT FROM curRes INTO @id, @val, @reset, @grp;
SET @last_grp = @grp;
WHILE @@FETCH_STATUS = 0 
BEGIN
 IF @grp <> @last_grp SET @acm = 0;
 SET @last_grp = @grp;
 SET @acm = @acm + @val;
 UPDATE #reset_runn_total
 SET RunningTotal = @acm
 WHERE id = @id;
 IF @acm > @reset SET @acm = 0;
 FETCH NEXT FROM curRes INTO @id, @val, @reset, @grp;
END
CLOSE curRes;
DEALLOCATE curRes;
+----+-----+-----------+-------------+
| id | val | reset_val | RunningTotal|
+----+-----+-----------+-------------+
| 1 | 1 | 10 | 1 |
+----+-----+-----------+-------------+
| 2 | 8 | 12 | 9 |
+----+-----+-----------+-------------+
| 3 | 6 | 14 | 15 |
+----+-----+-----------+-------------+
| 4 | 5 | 10 | 5 |
+----+-----+-----------+-------------+
| 5 | 6 | 13 | 11 |
+----+-----+-----------+-------------+
| 6 | 3 | 11 | 14 |
+----+-----+-----------+-------------+
| 7 | 9 | 8 | 9 |
+----+-----+-----------+-------------+
| 8 | 10 | 12 | 10 |
+----+-----+-----------+-------------+

Check here: http://rextester.com/WSPLO95303

answered Feb 9, 2017 at 18:38
0
3

Not windowed, but pure SQL version:

WITH x AS (
 SELECT TOP 1 id,
 val,
 reset_val,
 val AS running_total,
 1 AS level 
 FROM reset_runn_total
 UNION ALL
 SELECT r.id,
 r.val,
 r.reset_val,
 CASE WHEN x.running_total < x.reset_val THEN x.running_total + r.val ELSE r.val END,
 level = level + 1
 FROM x JOIN reset_runn_total AS r ON (r.id > x.id)
) SELECT
 *
FROM x
WHERE NOT EXISTS (
 SELECT 1
 FROM x AS x2
 WHERE x2.id = x.id
 AND x2.level > x.level
 )
ORDER BY id, level DESC
;

I am not a specialist in SQL Server's dialect. This is an initial version for PostrgreSQL (if I understand correctly I can not use LIMIT 1 / TOP 1 in recursive part in SQL Server):

WITH RECURSIVE x AS (
 (SELECT id, val, reset_val, val AS running_total
 FROM reset_runn_total
 ORDER BY id
 LIMIT 1)
 UNION
 (SELECT r.id, r.val, r.reset_val,
 CASE WHEN x.running_total < x.reset_val THEN x.running_total + r.val ELSE r.val END
 FROM x JOIN reset_runn_total AS r ON (r.id > x.id)
 ORDER BY id
 LIMIT 1)
) SELECT * FROM x;
Michael Green
25.3k13 gold badges54 silver badges100 bronze badges
answered Feb 10, 2017 at 23:28
3
  • @JoeObbish to be honest, that is not entirely clear from the question. The expected results for example, show no grp column. Commented Feb 11, 2017 at 20:08
  • @JoeObbish that's what I understood as well. yet, the question could benefit from an explicit statement about that. The code in the question (with the CTE) doesn't use it either (and it even has differently named columns). It would be obvious to anyone who reads the question - they wouldn't - and shouldn't - have to read the other answers or comments. Commented Feb 11, 2017 at 20:28
  • @ypercubeTM Added required information on question. Commented Feb 14, 2017 at 11:02
1

It seems you have several queries / methods to attack the problem but you haven't provided us - or even considered? - the indexes on the table.

What indexes are there in the table? Is it a heap or does it have a clustered index?

I would try the various solutions suggested after adding this index:

(grp, id) INCLUDE (val, reset_val)

Or just change (or make) the clustered index to be (grp, id).

Having an index that targets the specific query should improve the efficiency - of most if not all methods.

answered Feb 11, 2017 at 19:20
1
  • Added required information on question. Commented Feb 13, 2017 at 12:40

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.