Set non-negative floor for rolling sum, in PostgreSQL

Question 1

This is a really fun question (asked for SQL Server) and I wanted to try it out to see how it was done in PostgreSQL. Let's see if anyone else can do it better. Taking this data,

CREATE TABLE foo
AS
 SELECT pkid::int, numvalue::int, groupid::int
 FROM ( VALUES
 ( 1, -1 , 1 ),
 ( 2, -2 , 1 ),
 ( 3, 5 , 1 ),
 ( 4, -7 , 1 ),
 ( 5, 1 , 2 )
 ) AS t(pkid, numvalue, groupid);

We're trying to generate this:

PKID RollingSum GroupID
----------------------------- ## Explanation: 
1 0 1 ## 0 - 1 < 0 => 0
2 0 1 ## 0 - 2 < 0 => 0
3 5 1 ## 0 + 5 > 0 => 5
4 0 1 ## 5 - 7 < 0 => 0

The problem is described as,

When adding a negative number will cause the sum to be negative, the limit will be activated to set the result as zero. Subsequent addition should be based on this adjusted value, instead of the original rolling sum.

The expected result should be achieved using addition. If the fourth number changes from -7 to -3, the fourth result should be 2 instead of 0

If a single sum can be provided rather than a few rolling numbers, it would also be acceptable. I can use stored procedures to implement a non-negative addition, but that would be too low-level.

The real life problem for this is that we record order placed as positive amount and cancelled as negative. Due to connectivity issues customers may click the cancel button more than once, which will result in multiple negative values being recorded. When calculating our revenue, "zero" need to be a boundary for sales.

Their solutions are all using recursion.

Question 2

Some tests: dbfiddle.uk/…

Question 3

This is how I solved a similar problem on Teradata using nested OLAP-functions:

SELECT dt.*,
 -- find the lowest previous CumSum < 0 
 -- and adjust the current CumSum to zero
 Max(CASE WHEN CumSum < 0 THEN -CumSum ELSE 0 end)
 Over (PARTITION BY groupid
 ORDER BY pkid
 ROWS Unbounded Preceding)
 + CumSum AS AdjustedSum
FROM 
 ( 
 SELECT pkid, numvalue, groupid,
 -- calculate a standard cumulative sum
 Sum(numvalue)
 Over (PARTITION BY groupid
 ORDER BY pkid
 ROWS Unbounded Preceding) AS CumSum
 FROM foo
 ) AS dt

Question 4

Clever! (but probably not efficient)

Question 5

I'm just now looking at this, and I find myself having to experiment with it a bit, you should have explained a little. Cool method anyway. =)

Question 6

@joanolo: Probably more efficient than the recursive solutions for SQL Server (at least way more efficient on Teradata). A custom Windowed Aggregate function would be more efficient on Teradata, too, but must be coded in C :-)

Question 7

This is about 5X faster on my machine than recursion in SQL Server.

Question 8

With PKID being unique, is there a need to do ROWS Unbounded Preceding?

Question 9

Using a Custom Aggregate function

We use CREATE FUNCTION to create a function int_add_pos_or_zero that adds numbers, but if they're less than 0, returns 0.

CREATE FUNCTION int_add_pos_or_zero(int, int)
RETURNS int
AS $$
 BEGIN
 RETURN greatest(1ドル + 2,ドル 0);
 END;
$$
LANGUAGE plpgsql
IMMUTABLE;

Now we CREATE AGGREGATE on that so we can run it in a window function. We set INITCOND to be =0.

CREATE AGGREGATE add_pos_or_zero(int) (
 SFUNC = int_add_pos_or_zero,
 STYPE = int,
 INITCOND = 0
);

Now we query it like any other Window Function.

SELECT pkid,
 groupid,
 numvalue,
 add_pos_or_zero(numvalue) OVER (PARTITION BY groupid ORDER BY pkid)
FROM foo;
 pkid | groupid | numvalue | add_pos_or_zero 
------+---------+----------+-----------------
 1 | 1 | -1 | 0
 2 | 1 | -2 | 0
 3 | 1 | 5 | 5
 4 | 1 | -7 | 0
 5 | 2 | 1 | 1
(5 rows)

Question 10

Query sith window functions

This is much like dnoeth's smart query (same basic logic). Just slightly shorter and more efficient with a simpler expression in the outer query:

SELECT groupid, pkid
 , simple_sum
 - LEAST(MIN(simple_sum)
 OVER (PARTITION BY groupid
 ORDER BY pkid ROWS UNBOUNDED PRECEDING), 0) AS rolling_sum
FROM ( 
 SELECT pkid, numvalue, groupid
 , SUM(numvalue) OVER (PARTITION BY groupid
 ORDER BY pkid ROWS UNBOUNDED PRECEDING) AS simple_sum
 FROM foo
 ) sub;

How does it work?
To compute the special rolling sum as requested, for every row where the simple rolling sum would turn out negative, we add the same positive number to make it zero instead. That's precisely what the calculation in the outer SELECT does, subtracting negative numbers adds the corresponding positive:

LEAST(MIN(simple_sum) OVER (PARTITION BY groupid
 ORDER BY pkid ROWS UNBOUNDED PRECEDING), 0)

The surrounding LEAST cancels any action for positive numbers (or 0). The least negative (greatest absolute number) of the simple running sum is what we have to add so far in total. Every time our calculation would go below zero, we get a new absolute low in the simple running sum. It's all beautifully simple.

PL/pgSQL functions

Based off Abelisto's implementation, improved:

CREATE OR REPLACE FUNCTION f_special_rolling_sum()
 RETURNS TABLE (groupid int, pkid int, numvalue int, rolling_sum int) AS
$func$
DECLARE
 last_groupid int;
BEGIN
 FOR groupid, pkid, numvalue IN 
 SELECT f.groupid, f.pkid, f.numvalue
 FROM foo f
 ORDER BY f.groupid, f.pkid
 LOOP
 IF last_groupid = groupid THEN -- same partition continues
 rolling_sum := GREATEST(rolling_sum + numvalue, 0);
 ELSE -- new partition
 last_groupid := groupid;
 rolling_sum := GREATEST(numvalue, 0);
 END IF;
 RETURN NEXT;
 END LOOP;
END
$func$ LANGUAGE plpgsql;

Call:

SELECT * FROM f_special_rolling_sum();

Index

All solutions supplied so far can benefit from an index-only scan with a covering index:

CREATE INDEX idx_foo_covering ON foo(groupid, pkid, numvalue);

Slow index scans in large table

Tests

After optimizing function, queries, index (and the test itself) I see similar performance for both, the query being slightly faster than the function. (The aggregate function a bit slower than the rest.) Extensive test suite (based off Abelisto's fiddle):

dbfiddle for pg 9.6 here

dbfiddle for pg 10 here

Question 11

Well, this is ugly, but it works without adding any new function or aggregate:

SELECT *, 
 CASE
 WHEN numvalue > 0 
 THEN sum( greatest(numvalue,0) ) OVER (PARTITION BY groupid ORDER BY pkid)
 ELSE 0 
 END AS result
FROM foo;

Question 12

I upvoted this, but this solution doesn't work. Change pkid 4's numvalue to -2 in the solution above. 5-2 should be three, you catch numvalue being -2 and set the result to 0.

Question 13

Indeed. My bad! Anyway, nice problem :) The aggregate solution is pretty neat.

dnoeth dnoeth 4,20613 silver badges14 bronze badges · Accepted Answer · 2017-07-09 19:33:20Z

11

This is how I solved a similar problem on Teradata using nested OLAP-functions:

SELECT dt.*,
 -- find the lowest previous CumSum < 0 
 -- and adjust the current CumSum to zero
 Max(CASE WHEN CumSum < 0 THEN -CumSum ELSE 0 end)
 Over (PARTITION BY groupid
 ORDER BY pkid
 ROWS Unbounded Preceding)
 + CumSum AS AdjustedSum
FROM 
 ( 
 SELECT pkid, numvalue, groupid,
 -- calculate a standard cumulative sum
 Sum(numvalue)
 Over (PARTITION BY groupid
 ORDER BY pkid
 ROWS Unbounded Preceding) AS CumSum
 FROM foo
 ) AS dt

Share

Improve this answer

edited Jul 9, 2017 at 19:53

answered Jul 9, 2017 at 19:33

dnoeth's user avatar

dnoeth dnoeth

4,20613 silver badges14 bronze badges

12

Clever! (but probably not efficient)

joanolo
– joanolo

2017年07月09日 20:12:22 +00:00
Commented Jul 9, 2017 at 20:12
1

I'm just now looking at this, and I find myself having to experiment with it a bit, you should have explained a little. Cool method anyway. =)

Evan Carroll
– Evan Carroll

2017年07月09日 20:16:54 +00:00
Commented Jul 9, 2017 at 20:16
2

@joanolo: Probably more efficient than the recursive solutions for SQL Server (at least way more efficient on Teradata). A custom Windowed Aggregate function would be more efficient on Teradata, too, but must be coded in C :-)

dnoeth
– dnoeth

2017年07月09日 20:54:08 +00:00
Commented Jul 9, 2017 at 20:54
1

This is about 5X faster on my machine than recursion in SQL Server.

Joe Obbish
– Joe Obbish

2017年07月10日 03:31:17 +00:00
Commented Jul 10, 2017 at 3:31
With PKID being unique, is there a need to do ROWS Unbounded Preceding?

Evan Carroll
– Evan Carroll

2017年07月10日 14:30:26 +00:00
Commented Jul 10, 2017 at 14:30

| Show 7 more comments

Stack Exchange Network

Set non-negative floor for rolling sum, in PostgreSQL

4 Answers 4

Using a Custom Aggregate function

Query sith window functions

PL/pgSQL functions

Index

Tests

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Set non-negative floor for rolling sum, in PostgreSQL

4 Answers 4

Using a Custom Aggregate function

Query sith window functions

PL/pgSQL functions

Index

Tests

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions