2

Using PostgreSQL 12, I need to calculate the cumulative sum by day while filling in missing days. I also need to set a minimum value of 0 if any cumulative sum goes negative.

Background

There are two excellent answers for cumulative sum with missing days and non-negative floor values. I need to combine these two solutions.

Take this table and data as an example (note, this is also contained in this db fiddle)

CREATE TABLE inventory_test (
 "id" serial,
 "account_id" bigint NOT NULL,
 "is_active" boolean NOT NULL DEFAULT 'true',
 "object_timestamp" bigint NOT NULL,
 "amount" numeric(12, 1) NOT NULL,
 "object_id" bigint NOT NULL,
 PRIMARY KEY ("id")
);

*Note that object_timestamp is epoch time in milliseconds

SELECT id, to_timestamp((object_timestamp + 0 )/1000), amount
FROM inventory_test
ORDER BY object_timestamp;
id to_timestamp amount
1 2022年04月08日 15:13:30+01 10000.0
2 2022年04月09日 15:13:30+01 -2000.0
3 2022年04月09日 15:13:31+01 -1000.0
4 2022年04月10日 15:13:30+01 -2000.0
5 2022年04月11日 15:13:30+01 -3000.0
6 2022年04月12日 15:13:30+01 -2500.0
7 2022年04月15日 15:13:30+01 -2500.0
8 2022年04月16日 15:13:30+01 -3000.0
9 2022年04月17日 15:13:30+01 1000.0

I can generate the daily totals accounting for missing days with:

WITH cte as (
 SELECT 
 date_trunc('day', to_timestamp((object_timestamp)/1000)) as day,
 sum(amount) as day_sum
 FROM inventory_test
 WHERE object_id = 1 AND is_active = true
 GROUP by 1
 )
SELECT 
 day, 
 sum(c.day_sum) OVER (ORDER BY day) AS running_sum
FROM 
 (SELECT min(day) AS min_day FROM cte) init,
 generate_series(init.min_day, now(), interval '1 day') day
LEFT JOIN cte c USING (day)
ORDER BY day;
day running_sum
2022年04月08日 00:00:00+00 10000
2022年04月09日 00:00:00+00 7000
2022年04月10日 00:00:00+00 5000
2022年04月11日 00:00:00+00 2000
2022年04月12日 00:00:00+00 -500
2022年04月13日 00:00:00+00 -500
2022年04月14日 00:00:00+00 -500
2022年04月15日 00:00:00+00 -3000
2022年04月16日 00:00:00+00 -6000
2022年04月17日 00:00:00+00 -5000

DB fiddle

You can see this in this DB fiddle

Desired results

What I want the results to look like are non-negative, where any positive changes are applied to the adjusted running total which should be MAX(0, running_sum). I'd prefer to solve this in a SQL statement rather than a custom function.

Desired output:

day running_sum
2022年04月08日 00:00:00+00 10000
2022年04月09日 00:00:00+00 7000
2022年04月10日 00:00:00+00 5000
2022年04月11日 00:00:00+00 2000
2022年04月12日 00:00:00+00 0
2022年04月13日 00:00:00+00 0
2022年04月14日 00:00:00+00 0
2022年04月15日 00:00:00+00 0
2022年04月16日 00:00:00+00 0
2022年04月17日 00:00:00+00 1000
asked Apr 18, 2022 at 3:34
9
  • 1
    Hi, and welcome to dba.se! Could you please put your data in the DDL format (i.e. INSERT INTO my_table VALUES (..... This removes the possibility of error and eliminates duplicatoin of effort on the part of those trying to answer. Help us to help you! :-) Commented Apr 18, 2022 at 8:12
  • Go to dbfiddle.uk and construct one therre! Commented Apr 18, 2022 at 11:14
  • Thank you! DB fiddle added. Commented Apr 18, 2022 at 15:01
  • I'm a bit puzzled. You have an initial 10,000 - then -2,000 but your running sum goes to 7,000. Why not 8,000? The first two dates? Commented Apr 18, 2022 at 15:43
  • There are two rows which represent the same day. The row with timestamps 1649513610000 and 1649513611000. These are merely 1s (1000ms) apart on 2022年04月09日. Sum up the values (-2000 + -1000) results in 10000-ひく3000 = 7000 on 2022年04月09日 Commented Apr 18, 2022 at 15:51

1 Answer 1

2

In order to answer this, I did the following (all of the code below can be found on the fiddle here):

The data can be found on the fiddle, so I'll just start on the SQL:

My first step was to create a calendar table - they really are very handy, take up little space and make the SQL much cleaner. Of course, the dates can be generated dynmically if that's a requirement.

So, my first pass at the SQL was this:

WITH c1 AS
(
 SELECT
 c.d,
 COALESCE(SUM(it.amount), 0) AS s
 FROM calendar c
 LEFT JOIN invtest it
 ON c.d = to_timestamp(it.object_timestamp/1000)::DATE
 WHERE c.d BETWEEN '2022-04-08' AND '2022-04-17'
 GROUP BY c.d 
 ORDER BY c.d 
), c2 AS
(
 SELECT
 d,
 CASE 
 WHEN SUM(s) OVER (ORDER BY d) <= 0 THEN 0
 ELSE SUM(s) OVER (ORDER BY d)
 END AS the_SUM
 FROM c1
)
SELECT 
 d,
 the_sum
FROM c2;

Result:

d the_sum
2022年04月08日 10000.0
2022年04月09日 7000.0
2022年04月10日 5000.0
2022年04月11日 2000.0
2022年04月12日 0
2022年04月13日 0
2022年04月14日 0
2022年04月15日 0
2022年04月16日 0
2022年04月17日 0

Not bad for a first pass - but the devil is in the detail - Pareto's Principle tells us that we get 80% of the result with 20% of the effort - it's getting to that 100% that costs us way more...(i.e. 80% more1) as was the case in this particular endeavour! There were extra requirements which arose in the chat which proved challenging to say the least.

My next effort was this:

SELECT 
 d, 
 the_sum,
 
 
 CASE WHEN
(SELECT amount 
 FROM invtest WHERE TO_TIMESTAMP(object_timestamp/1000)::DATE = t2.d 
 AND t2.the_sum = 0)> 0
 then
 (SELECT amount 
 FROM invtest WHERE TO_TIMESTAMP(object_timestamp/1000)::DATE = t2.d 
 AND t2.the_sum = 0)
 
 ELSE 0
 END AS whatever
FROM
(
WITH c1 AS
(
 SELECT
 c.d,
 COALESCE(SUM(it.amount), 0) AS s
 FROM calendar c
 LEFT JOIN invtest it
 ON c.d = to_timestamp(it.object_timestamp/1000)::DATE
 WHERE c.d BETWEEN '2022-04-08' AND '2022-04-17'
 GROUP BY c.d 
 ORDER BY c.d 
), c2 AS
(
 SELECT
 d,
 CASE 
 WHEN SUM(s) OVER (ORDER BY d) <= 0 THEN 0
 ELSE SUM(s) OVER (ORDER BY d)
 END AS the_SUM
 FROM c1
)
SELECT 
 d,
 the_sum
FROM c2) AS t2;

Result:

d the_sum whatever
2022年04月08日 10000.0 0
2022年04月09日 7000.0 0
2022年04月10日 5000.0 0
2022年04月11日 2000.0 0
2022年04月12日 0 0
2022年04月13日 0 0
2022年04月14日 0 0
2022年04月15日 0 0
2022年04月16日 0 0
2022年04月17日 0 1000.0

The last individual being the only person with a negative the_sum who received a positive amount! This is signifcant, because the requirement in this case was that if someone had a negative the_sum, but received a positive amount, their sum went to that amount and wasn't just simply added to their "debt".

Anyway, to cut to the chase, here's the final SQL.

SELECT 
 d, 
 COALESCE(CASE 
 WHEN the_sum > 0 THEN the_sum
 WHEN
 (SELECT amount 
 FROM invtest 
 WHERE TO_TIMESTAMP(object_timestamp/1000)::DATE = t2.d 
 AND t2.the_sum = 0)> 0
 THEN
 (SELECT amount 
 FROM invtest 
 WHERE TO_TIMESTAMP(object_timestamp/1000)::DATE = t2.d 
 AND t2.the_sum = 0)
 END, 0)::INT AS the_sum
FROM
(
WITH c1 AS
(
 SELECT
 c.d,
 COALESCE(SUM(it.amount), 0) AS s
 FROM calendar c
 LEFT JOIN invtest it
 ON c.d = to_timestamp(it.object_timestamp/1000)::DATE
 WHERE c.d BETWEEN '2022-04-08' AND '2022-04-17'
 GROUP BY c.d 
 ORDER BY c.d 
), c2 AS
(
 SELECT
 d,
 CASE 
 WHEN SUM(s) OVER (ORDER BY d) <= 0 THEN 0
 ELSE SUM(s) OVER (ORDER BY d)
 END AS the_SUM
 FROM c1
)
SELECT 
 d,
 the_sum
FROM c2) AS t2;

Result:

d the_sum
2022年04月08日 10000
2022年04月09日 7000
2022年04月10日 5000
2022年04月11日 2000
2022年04月12日 0
2022年04月13日 0
2022年04月14日 0
2022年04月15日 0
2022年04月16日 0
2022年04月17日 1000

Et voilà - as the OP desired! All the gory details are to be found in the fiddle here (the final answer is at the end - as one might expect! ;-) ).

answered Apr 19, 2022 at 14:02
7
  • This is very close. The issue is that I need to calculate the running total through now(). Changing the WHERE clot to WHERE c.d BETWEEN '2022年04月08日' AND now() results in the last rows return 0, where there are no inventory changes. In my scenario I need those to be the running total value, which is 1000 in this example. Commented Apr 21, 2022 at 14:47
  • If I've understood you correctly, you want to include the dates from 2022年04月17日 to NOW() - even though there have been no changes to amount - so the numbers remain the same, you just want to display them through to 2022年04月21日 9 (today's date)? Commented Apr 21, 2022 at 15:00
  • Or have I got that wrong? Commented Apr 21, 2022 at 18:37
  • Exactly. The use case is inventory. However there are times when inventory levels drop below 0 and in that case we need is merely assume 0, rather than negative. So the daily inventory level will be the last positive value calculated. If we have an inventory level of 1000 and 7 days pass with no changes, the inventory level is still 1000 for each of those 7 days with no activity. Commented Apr 21, 2022 at 22:00
  • @brianz - important question - is there always an amount, even it it's 0 or can there be NULLs ? Commented Apr 22, 2022 at 4:41

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.