Using PostgreSQL 12, I need to calculate the cumulative sum by day while filling in missing days. I also need to set a minimum value of 0
if any cumulative sum goes negative.
Background
There are two excellent answers for cumulative sum with missing days and non-negative floor values. I need to combine these two solutions.
Take this table and data as an example (note, this is also contained in this db fiddle)
CREATE TABLE inventory_test (
"id" serial,
"account_id" bigint NOT NULL,
"is_active" boolean NOT NULL DEFAULT 'true',
"object_timestamp" bigint NOT NULL,
"amount" numeric(12, 1) NOT NULL,
"object_id" bigint NOT NULL,
PRIMARY KEY ("id")
);
*Note that object_timestamp
is epoch time in milliseconds
SELECT id, to_timestamp((object_timestamp + 0 )/1000), amount
FROM inventory_test
ORDER BY object_timestamp;
id | to_timestamp | amount |
---|---|---|
1 | 2022年04月08日 15:13:30+01 | 10000.0 |
2 | 2022年04月09日 15:13:30+01 | -2000.0 |
3 | 2022年04月09日 15:13:31+01 | -1000.0 |
4 | 2022年04月10日 15:13:30+01 | -2000.0 |
5 | 2022年04月11日 15:13:30+01 | -3000.0 |
6 | 2022年04月12日 15:13:30+01 | -2500.0 |
7 | 2022年04月15日 15:13:30+01 | -2500.0 |
8 | 2022年04月16日 15:13:30+01 | -3000.0 |
9 | 2022年04月17日 15:13:30+01 | 1000.0 |
I can generate the daily totals accounting for missing days with:
WITH cte as (
SELECT
date_trunc('day', to_timestamp((object_timestamp)/1000)) as day,
sum(amount) as day_sum
FROM inventory_test
WHERE object_id = 1 AND is_active = true
GROUP by 1
)
SELECT
day,
sum(c.day_sum) OVER (ORDER BY day) AS running_sum
FROM
(SELECT min(day) AS min_day FROM cte) init,
generate_series(init.min_day, now(), interval '1 day') day
LEFT JOIN cte c USING (day)
ORDER BY day;
day | running_sum |
---|---|
2022年04月08日 00:00:00+00 | 10000 |
2022年04月09日 00:00:00+00 | 7000 |
2022年04月10日 00:00:00+00 | 5000 |
2022年04月11日 00:00:00+00 | 2000 |
2022年04月12日 00:00:00+00 | -500 |
2022年04月13日 00:00:00+00 | -500 |
2022年04月14日 00:00:00+00 | -500 |
2022年04月15日 00:00:00+00 | -3000 |
2022年04月16日 00:00:00+00 | -6000 |
2022年04月17日 00:00:00+00 | -5000 |
DB fiddle
You can see this in this DB fiddle
Desired results
What I want the results to look like are non-negative, where any positive changes are applied to the adjusted running total which should be MAX(0, running_sum)
. I'd prefer to solve this in a SQL statement rather than a custom function.
Desired output:
day | running_sum |
---|---|
2022年04月08日 00:00:00+00 | 10000 |
2022年04月09日 00:00:00+00 | 7000 |
2022年04月10日 00:00:00+00 | 5000 |
2022年04月11日 00:00:00+00 | 2000 |
2022年04月12日 00:00:00+00 | 0 |
2022年04月13日 00:00:00+00 | 0 |
2022年04月14日 00:00:00+00 | 0 |
2022年04月15日 00:00:00+00 | 0 |
2022年04月16日 00:00:00+00 | 0 |
2022年04月17日 00:00:00+00 | 1000 |
1 Answer 1
In order to answer this, I did the following (all of the code below can be found on the fiddle here):
The data can be found on the fiddle, so I'll just start on the SQL:
My first step was to create a calendar table - they really are very handy, take up little space and make the SQL much cleaner. Of course, the dates can be generated dynmically if that's a requirement.
So, my first pass at the SQL was this:
WITH c1 AS
(
SELECT
c.d,
COALESCE(SUM(it.amount), 0) AS s
FROM calendar c
LEFT JOIN invtest it
ON c.d = to_timestamp(it.object_timestamp/1000)::DATE
WHERE c.d BETWEEN '2022-04-08' AND '2022-04-17'
GROUP BY c.d
ORDER BY c.d
), c2 AS
(
SELECT
d,
CASE
WHEN SUM(s) OVER (ORDER BY d) <= 0 THEN 0
ELSE SUM(s) OVER (ORDER BY d)
END AS the_SUM
FROM c1
)
SELECT
d,
the_sum
FROM c2;
Result:
d the_sum
2022年04月08日 10000.0
2022年04月09日 7000.0
2022年04月10日 5000.0
2022年04月11日 2000.0
2022年04月12日 0
2022年04月13日 0
2022年04月14日 0
2022年04月15日 0
2022年04月16日 0
2022年04月17日 0
Not bad for a first pass - but the devil is in the detail - Pareto's Principle tells us that we get 80% of the result with 20% of the effort - it's getting to that 100% that costs us way more...(i.e. 80% more1) as was the case in this particular endeavour! There were extra requirements which arose in the chat which proved challenging to say the least.
My next effort was this:
SELECT
d,
the_sum,
CASE WHEN
(SELECT amount
FROM invtest WHERE TO_TIMESTAMP(object_timestamp/1000)::DATE = t2.d
AND t2.the_sum = 0)> 0
then
(SELECT amount
FROM invtest WHERE TO_TIMESTAMP(object_timestamp/1000)::DATE = t2.d
AND t2.the_sum = 0)
ELSE 0
END AS whatever
FROM
(
WITH c1 AS
(
SELECT
c.d,
COALESCE(SUM(it.amount), 0) AS s
FROM calendar c
LEFT JOIN invtest it
ON c.d = to_timestamp(it.object_timestamp/1000)::DATE
WHERE c.d BETWEEN '2022-04-08' AND '2022-04-17'
GROUP BY c.d
ORDER BY c.d
), c2 AS
(
SELECT
d,
CASE
WHEN SUM(s) OVER (ORDER BY d) <= 0 THEN 0
ELSE SUM(s) OVER (ORDER BY d)
END AS the_SUM
FROM c1
)
SELECT
d,
the_sum
FROM c2) AS t2;
Result:
d the_sum whatever
2022年04月08日 10000.0 0
2022年04月09日 7000.0 0
2022年04月10日 5000.0 0
2022年04月11日 2000.0 0
2022年04月12日 0 0
2022年04月13日 0 0
2022年04月14日 0 0
2022年04月15日 0 0
2022年04月16日 0 0
2022年04月17日 0 1000.0
The last individual being the only person with a negative the_sum
who received a positive amount! This is signifcant, because the requirement in this case was that if someone had a negative the_sum
, but received a positive amount, their sum went to that amount and wasn't just simply added to their "debt".
Anyway, to cut to the chase, here's the final SQL.
SELECT
d,
COALESCE(CASE
WHEN the_sum > 0 THEN the_sum
WHEN
(SELECT amount
FROM invtest
WHERE TO_TIMESTAMP(object_timestamp/1000)::DATE = t2.d
AND t2.the_sum = 0)> 0
THEN
(SELECT amount
FROM invtest
WHERE TO_TIMESTAMP(object_timestamp/1000)::DATE = t2.d
AND t2.the_sum = 0)
END, 0)::INT AS the_sum
FROM
(
WITH c1 AS
(
SELECT
c.d,
COALESCE(SUM(it.amount), 0) AS s
FROM calendar c
LEFT JOIN invtest it
ON c.d = to_timestamp(it.object_timestamp/1000)::DATE
WHERE c.d BETWEEN '2022-04-08' AND '2022-04-17'
GROUP BY c.d
ORDER BY c.d
), c2 AS
(
SELECT
d,
CASE
WHEN SUM(s) OVER (ORDER BY d) <= 0 THEN 0
ELSE SUM(s) OVER (ORDER BY d)
END AS the_SUM
FROM c1
)
SELECT
d,
the_sum
FROM c2) AS t2;
Result:
d the_sum
2022年04月08日 10000
2022年04月09日 7000
2022年04月10日 5000
2022年04月11日 2000
2022年04月12日 0
2022年04月13日 0
2022年04月14日 0
2022年04月15日 0
2022年04月16日 0
2022年04月17日 1000
Et voilà - as the OP desired! All the gory details are to be found in the fiddle here (the final answer is at the end - as one might expect! ;-) ).
-
This is very close. The issue is that I need to calculate the running total through
now()
. Changing theWHERE
clot toWHERE c.d BETWEEN '2022年04月08日' AND now()
results in the last rows return 0, where there are no inventory changes. In my scenario I need those to be the running total value, which is1000
in this example.brianz– brianz2022年04月21日 14:47:02 +00:00Commented Apr 21, 2022 at 14:47 -
If I've understood you correctly, you want to include the dates from 2022年04月17日 to NOW() - even though there have been no changes to amount - so the numbers remain the same, you just want to display them through to 2022年04月21日 9 (today's date)?Vérace– Vérace2022年04月21日 15:00:55 +00:00Commented Apr 21, 2022 at 15:00
-
Or have I got that wrong?Vérace– Vérace2022年04月21日 18:37:20 +00:00Commented Apr 21, 2022 at 18:37
-
Exactly. The use case is inventory. However there are times when inventory levels drop below
0
and in that case we need is merely assume0
, rather than negative. So the daily inventory level will be the last positive value calculated. If we have an inventory level of1000
and 7 days pass with no changes, the inventory level is still1000
for each of those 7 days with no activity.brianz– brianz2022年04月21日 22:00:05 +00:00Commented Apr 21, 2022 at 22:00 -
@brianz - important question - is there always an amount, even it it's
0
or can there beNULL
s ?Vérace– Vérace2022年04月22日 04:41:58 +00:00Commented Apr 22, 2022 at 4:41
Explore related questions
See similar questions with these tags.
INSERT INTO my_table VALUES (.....
This removes the possibility of error and eliminates duplicatoin of effort on the part of those trying to answer. Help us to help you! :-)1649513610000
and1649513611000
. These are merely 1s (1000ms) apart on2022年04月09日
. Sum up the values (-2000 + -1000) results in 10000-ひく3000 =わ 7000 on2022年04月09日