13

I need to get the rolling sum over a period of 7 days for each row (1 row per day).

For example:

| Date | Count | 7-Day Rolling Sum |
------------------------------------------
| 2016年02月01日 | 1 | 1
| 2016年02月02日 | 1 | 2
| 2016年02月03日 | 2 | 4
| 2016年02月04日 | 2 | 6
| 2016年02月05日 | 2 | 8
| 2016年02月06日 | 2 | 10
| 2016年02月07日 | 2 | 12
| 2016年02月08日 | 2 | 13 --> here we start summing from 02-02
| 2016年02月09日 | 2 | 14 --> here we start summing from 02-03
| 2016年02月10日 | 5 | 17 --> here we start summing from 02-04

I need this in one query that returns the rows with 7-day rolling sum and the date of the last day of the range of the sum. For example, day=2016年02月10日, sum 17.

So far I have this but it's not fully working:

DO
$do$
DECLARE 
 curr_date date;
 num bigint;
BEGIN
FOR curr_date IN (SELECT date_trunc('day', d)::date FROM generate_series(CURRENT_DATE-31, CURRENT_DATE-1, '1 day'::interval) d)
LOOP 
 SELECT curr_date, SUM(count)
 FROM generate_series (curr_date-8, curr_date-1, '1 day'::interval) d
 LEFT JOIN m.ping AS p ON p.date = d
 LEFT JOIN m.ping_type AS pt ON pt.id = p.ping_type_id
 LEFT JOIN m.ping_frequency AS pf ON pf.id = p.ping_frequency_id
 WHERE
 pt.url_slug = 'active' AND
 pf.url_slug = 'weekly';
END LOOP;
END
$do$;

I am using PostgreSQL 9.4.5. There could be multiple rows with the same date. If there is a gap (a day is missing), the 7-consecutive-day range will still be followed.

Paul White
95.3k30 gold badges439 silver badges689 bronze badges
asked Mar 1, 2016 at 23:34
1

3 Answers 3

16

By far the cleanest solution is to use window function sum with rows between:

with days as (
 SELECT date_trunc('day', d)::date as day
 FROM generate_series(CURRENT_DATE-31, CURRENT_DATE-1, '1 day'::interval) d ),
 counts as (
 select 
 days.day,
 sum((random()*5)::integer) num
 FROM days
 -- left join other tables here to get counts, I'm using random
 group by days.day
 )
select
 day,
 num,
 sum(num) over (order by day ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)
from counts
order by day;

The important part is to generate the timeframe in days CTE and join onto it in order not to miss any days for which there is no data.

Example

For example, if I create some test data with 20 records in last 14 days:

SELECT (current_date - ((random()*14)::integer::text || 'days')::interval)::date as day, (random()*7)::integer as num
into test_data from generate_series(1, 20);;

And also add a value before that:

insert into test_data values ((current_date - '25 days'::interval), 5);

Then use the query above:

with days as (
 SELECT date_trunc('day', d)::date as day
 FROM generate_series(CURRENT_DATE-31, CURRENT_DATE-1, '1 day'::interval) d ),
 counts as (
 select 
 days.day,
 sum(t.num) num
 FROM days
 left join test_data t on t.day = days.day
 group by days.day
 )
select
 day,
 num,
 sum(num) over (order by day rows between 6 preceding and current row)
from counts
order by day;

And get the results for whole month:

 day | num | sum 
------------+-----+-----
 2016年01月31日 | | 
 2016年02月01日 | | 
 2016年02月02日 | | 
 2016年02月03日 | | 
 2016年02月04日 | | 
 2016年02月05日 | | 
 2016年02月06日 | 5 | 5
 2016年02月07日 | | 5
 2016年02月08日 | | 5
 2016年02月09日 | | 5
 2016年02月10日 | | 5
 2016年02月11日 | | 5
 2016年02月12日 | | 5
 2016年02月13日 | | 
 2016年02月14日 | | 
 2016年02月15日 | | 
 2016年02月16日 | | 
 2016年02月17日 | | 
 2016年02月18日 | 2 | 2
 2016年02月19日 | 5 | 7
 2016年02月20日 | | 7
 2016年02月21日 | 4 | 11
 2016年02月22日 | 15 | 26
 2016年02月23日 | 1 | 27
 2016年02月24日 | 1 | 28
 2016年02月25日 | 2 | 28
 2016年02月26日 | 4 | 27
 2016年02月27日 | 9 | 36
 2016年02月28日 | 5 | 37
 2016年02月29日 | 11 | 33
 2016年03月01日 | 5 | 37
(31 rows)
answered Mar 2, 2016 at 17:39
3
  • This is perfectly good if there are no gaps and only 1 row per day. If there are gaps or multiple rows per day (as the OP describes) and we want the sum for the rows that fall in the last 7 days (from today up to 6 days ago), then this will not work. I'm not sure excatly what the OP is after though, even after their clarifications, so get a +1. Commented Mar 2, 2016 at 17:47
  • No, this is okay. That's why you have to join onto days and that's why there is group by in counts. Commented Mar 2, 2016 at 17:50
  • Ah, I thought you had that just to produce the random data. Perhaps you can make more clear which part is the data and what is the calculation. Seems correct, yes. Commented Mar 2, 2016 at 17:57
0

Ended up using a FOR LOOP, a TEMP table and SELECT on the temp table once the for loop is done:

DO
$do$
DECLARE 
 curr_date DATE;
BEGIN
-- Create temp table to hold results
DROP TABLE IF EXISTS rolling_7day_sum;
CREATE TEMP TABLE rolling_7day_sum (
 date DATE,
 count BIGINT
);
-- Iterate dates and get 7 day rolling sum for each
FOR curr_date IN (SELECT date_trunc('day', d)::date FROM generate_series(
 -- Get earliest date from table
 (
 SELECT date FROM m.ping AS p
 LEFT JOIN m.ping_type AS pt ON pt.id = p.ping_type_id
 LEFT JOIN m.ping_frequency AS pf ON pf.id = p.ping_frequency_id
 WHERE
 pt.url_slug = 'active' AND
 pf.url_slug = 'weekly'
 ORDER BY date ASC
 LIMIT 1
 ), CURRENT_DATE-1, '1 day'::interval) d)
LOOP
 INSERT INTO rolling_7day_sum 
 SELECT curr_date, SUM(count)
 FROM generate_series (curr_date-8, curr_date-1, '1 day'::interval) d
 LEFT JOIN m.ping AS p ON p.date = d
 LEFT JOIN m.ping_type AS pt ON pt.id = p.ping_type_id
 LEFT JOIN m.ping_frequency AS pf ON pf.id = p.ping_frequency_id
 WHERE
 pt.url_slug = 'active' AND
 pf.url_slug = 'weekly';
END LOOP;
END
$do$;
SELECT date, count FROM rolling_7day_sum ORDER BY date ASC;

But I imagine there is a cleaner way of doing a 7-consecutive-rolling-sum than this.

answered Mar 2, 2016 at 16:40
1
  • You could possibly do a recursive query that goes 7 levels deep. Commented Mar 2, 2016 at 17:26
0

A recursive SQL query that goes 7 in depth may work, but I don't know how efficient it would be.

WITH RECURSIVE totals(start_day, end_day, total, depth) AS (
 SELECT date, date, count, 1 FROM table
 UNION ALL
 SELECT
 t.start_day,
 t.start_day + INTERVAL '1 day',
 total + COALESCE((SELECT count FROM table WHERE date = t.start_day + INTERVAL '1 day'), 0),
 t.depth + 1
 FROM totals t
) SELECT
 *
FROM totals
WHERE end_day = '2016-03-01' AND depth = 7;

Not tested for syntax or anything.

answered Mar 2, 2016 at 17:32

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.