This is a really fun question (asked for SQL Server) and I wanted to try it out to see how it was done in PostgreSQL. Let's see if anyone else can do it better. Taking this data,
CREATE TABLE foo
AS
SELECT pkid::int, numvalue::int, groupid::int
FROM ( VALUES
( 1, -1 , 1 ),
( 2, -2 , 1 ),
( 3, 5 , 1 ),
( 4, -7 , 1 ),
( 5, 1 , 2 )
) AS t(pkid, numvalue, groupid);
We're trying to generate this:
PKID RollingSum GroupID
----------------------------- ## Explanation:
1 0 1 ## 0 - 1 < 0 => 0
2 0 1 ## 0 - 2 < 0 => 0
3 5 1 ## 0 + 5 > 0 => 5
4 0 1 ## 5 - 7 < 0 => 0
The problem is described as,
When adding a negative number will cause the sum to be negative, the limit will be activated to set the result as zero. Subsequent addition should be based on this adjusted value, instead of the original rolling sum.
The expected result should be achieved using addition. If the fourth number changes from -7 to -3, the fourth result should be 2 instead of 0
If a single sum can be provided rather than a few rolling numbers, it would also be acceptable. I can use stored procedures to implement a non-negative addition, but that would be too low-level.
The real life problem for this is that we record order placed as positive amount and cancelled as negative. Due to connectivity issues customers may click the
cancel
button more than once, which will result in multiple negative values being recorded. When calculating our revenue, "zero" need to be a boundary for sales.
Their solutions are all using recursion.
-
1Some tests: dbfiddle.uk/…Abelisto– Abelisto2017年07月10日 22:48:15 +00:00Commented Jul 10, 2017 at 22:48
4 Answers 4
This is how I solved a similar problem on Teradata using nested OLAP-functions:
SELECT dt.*,
-- find the lowest previous CumSum < 0
-- and adjust the current CumSum to zero
Max(CASE WHEN CumSum < 0 THEN -CumSum ELSE 0 end)
Over (PARTITION BY groupid
ORDER BY pkid
ROWS Unbounded Preceding)
+ CumSum AS AdjustedSum
FROM
(
SELECT pkid, numvalue, groupid,
-- calculate a standard cumulative sum
Sum(numvalue)
Over (PARTITION BY groupid
ORDER BY pkid
ROWS Unbounded Preceding) AS CumSum
FROM foo
) AS dt
-
Clever! (but probably not efficient)joanolo– joanolo2017年07月09日 20:12:22 +00:00Commented Jul 9, 2017 at 20:12
-
1I'm just now looking at this, and I find myself having to experiment with it a bit, you should have explained a little. Cool method anyway. =)Evan Carroll– Evan Carroll2017年07月09日 20:16:54 +00:00Commented Jul 9, 2017 at 20:16
-
2@joanolo: Probably more efficient than the recursive solutions for SQL Server (at least way more efficient on Teradata). A custom Windowed Aggregate function would be more efficient on Teradata, too, but must be coded in C :-)dnoeth– dnoeth2017年07月09日 20:54:08 +00:00Commented Jul 9, 2017 at 20:54
-
1This is about 5X faster on my machine than recursion in SQL Server.Joe Obbish– Joe Obbish2017年07月10日 03:31:17 +00:00Commented Jul 10, 2017 at 3:31
-
With PKID being unique, is there a need to do
ROWS Unbounded Preceding
?Evan Carroll– Evan Carroll2017年07月10日 14:30:26 +00:00Commented Jul 10, 2017 at 14:30
Using a Custom Aggregate function
We use CREATE FUNCTION
to create a function int_add_pos_or_zero
that adds numbers, but if they're less than 0, returns 0.
CREATE FUNCTION int_add_pos_or_zero(int, int)
RETURNS int
AS $$
BEGIN
RETURN greatest(1ドル + 2,ドル 0);
END;
$$
LANGUAGE plpgsql
IMMUTABLE;
Now we CREATE AGGREGATE
on that so we can run it in a window function. We set INITCOND
to be =0
.
CREATE AGGREGATE add_pos_or_zero(int) (
SFUNC = int_add_pos_or_zero,
STYPE = int,
INITCOND = 0
);
Now we query it like any other Window Function.
SELECT pkid,
groupid,
numvalue,
add_pos_or_zero(numvalue) OVER (PARTITION BY groupid ORDER BY pkid)
FROM foo;
pkid | groupid | numvalue | add_pos_or_zero
------+---------+----------+-----------------
1 | 1 | -1 | 0
2 | 1 | -2 | 0
3 | 1 | 5 | 5
4 | 1 | -7 | 0
5 | 2 | 1 | 1
(5 rows)
Query sith window functions
This is much like dnoeth's smart query (same basic logic). Just slightly shorter and more efficient with a simpler expression in the outer query:
SELECT groupid, pkid
, simple_sum
- LEAST(MIN(simple_sum)
OVER (PARTITION BY groupid
ORDER BY pkid ROWS UNBOUNDED PRECEDING), 0) AS rolling_sum
FROM (
SELECT pkid, numvalue, groupid
, SUM(numvalue) OVER (PARTITION BY groupid
ORDER BY pkid ROWS UNBOUNDED PRECEDING) AS simple_sum
FROM foo
) sub;
How does it work?
To compute the special rolling sum as requested, for every row where the simple rolling sum would turn out negative, we add the same positive number to make it zero instead. That's precisely what the calculation in the outer SELECT
does, subtracting negative numbers adds the corresponding positive:
LEAST(MIN(simple_sum) OVER (PARTITION BY groupid
ORDER BY pkid ROWS UNBOUNDED PRECEDING), 0)
The surrounding LEAST
cancels any action for positive numbers (or 0).
The least negative (greatest absolute number) of the simple running sum is what we have to add so far in total. Every time our calculation would go below zero, we get a new absolute low in the simple running sum. It's all beautifully simple.
PL/pgSQL functions
Based off Abelisto's implementation, improved:
CREATE OR REPLACE FUNCTION f_special_rolling_sum()
RETURNS TABLE (groupid int, pkid int, numvalue int, rolling_sum int) AS
$func$
DECLARE
last_groupid int;
BEGIN
FOR groupid, pkid, numvalue IN
SELECT f.groupid, f.pkid, f.numvalue
FROM foo f
ORDER BY f.groupid, f.pkid
LOOP
IF last_groupid = groupid THEN -- same partition continues
rolling_sum := GREATEST(rolling_sum + numvalue, 0);
ELSE -- new partition
last_groupid := groupid;
rolling_sum := GREATEST(numvalue, 0);
END IF;
RETURN NEXT;
END LOOP;
END
$func$ LANGUAGE plpgsql;
Call:
SELECT * FROM f_special_rolling_sum();
Index
All solutions supplied so far can benefit from an index-only scan with a covering index:
CREATE INDEX idx_foo_covering ON foo(groupid, pkid, numvalue);
Related:
Tests
After optimizing function, queries, index (and the test itself) I see similar performance for both, the query being slightly faster than the function. (The aggregate function a bit slower than the rest.) Extensive test suite (based off Abelisto's fiddle):
dbfiddle for pg 9.6 here
dbfiddle for pg 10 here
Well, this is ugly, but it works without adding any new function or aggregate:
SELECT *,
CASE
WHEN numvalue > 0
THEN sum( greatest(numvalue,0) ) OVER (PARTITION BY groupid ORDER BY pkid)
ELSE 0
END AS result
FROM foo;
-
I upvoted this, but this solution doesn't work. Change pkid 4's numvalue to -2 in the solution above. 5-2 should be three, you catch numvalue being -2 and set the result to 0.Evan Carroll– Evan Carroll2017年07月14日 07:25:24 +00:00Commented Jul 14, 2017 at 7:25
-
Indeed. My bad! Anyway, nice problem :) The aggregate solution is pretty neat.Fulano da Silva Sauro– Fulano da Silva Sauro2017年07月15日 08:20:55 +00:00Commented Jul 15, 2017 at 8:20
Explore related questions
See similar questions with these tags.