I'm trying to find an efficient way to aggregate data for reporting. Let's say I need to aggregate the following data in 5-second intervals:
CREATE TABLE RawData
(
Result FLOAT,
CaptureTime DATETIME
);
INSERT INTO RawData VALUES
(2.3, '2018-04-01 00:00:00'),
(2.5, '2018-04-01 00:00:01'),
(2.8, '2018-04-01 00:00:02'),
(2.8, '2018-04-01 00:00:03'),
(3.4, '2018-04-01 00:00:04'),
(5.1, '2018-04-01 00:00:05'),
(2.2, '2018-04-01 00:00:06'),
(4.1, '2018-04-01 00:00:07'),
(4.3, '2018-04-01 00:00:08'),
(5.9, '2018-04-01 00:00:09'),
...
The aggregated data will be stored in another table:
CREATE TABLE AggregateData
(
Result FLOAT,
StartCaptureTime DATETIME,
EndCaptureTime DATETIME
);
As an example: if my aggregate function is SUM
, the data in AggregateData
would be:
+--------+----------------------+---------------------+
| Result | StartCaptureTime | EndCaptureTime |
+--------+----------------------+---------------------+
| 13.8 | 2018年04月01日 00:00:00 | 2018年04月01日 00:00:05 |
| 21.6 | 2018年04月01日 00:00:05 | 2018年04月01日 00:00:10 |
+--------+----------------------+---------------------+
The best solution that I've come up with uses a loop:
CREATE PROCEDURE [dbo].[spPerformAggregateCalculation]
@UpdateIntervalSeconds INT,
@StartTime DATETIME,
@EndTime DATETIME
AS
BEGIN
SET NOCOUNT ON;
DECLARE @RunningStartTime DATETIME = @StartTime;
DECLARE @RunningEndTime DATETIME = DATEADD(SECOND, @UpdateIntervalSeconds, @RunningStartTime);
DECLARE @AggregateValue FLOAT;
WHILE @RunningStartTime < @EndTime
BEGIN
SELECT @AggregateValue =
(
SELECT SUM(Result)
FROM RawData
WHERE CaptureTime >= @RunningStartTime AND CaptureTime < @RunningEndTime
);
INSERT INTO AggregateData VALUES
(@AggregateValue, @RunningStartTime, @RunningEndTime);
SELECT
@RunningStartTime = @RunningEndTime,
@RunningEndTime = DATEADD(SECOND, @UpdateIntervalSeconds, @RunningEndTime);
END
END
EXEC spPerformAggregateCalculation
@UpdateIntervalSeconds = 5,
@StartTime = '2018-04-01 00:00:00',
@EndTime = '2018-04-01 00:00:10'
Is there a more efficient way to get this data? I'll be performing more complex aggregations on millions of rows on a daily basis, so every bit helps.
P.S. In case "increase your interval" is offered as a suggestion, my data will actually be aggregated in much larger intervals than 5 seconds; I just whittled the problem down for the example.
Update
Below was my final solution, based dnoeth's answer:
CREATE PROCEDURE [dbo].[spPerformAggregateCalculation]
@UpdateIntervalSeconds INT,
@StartTime DATETIME,
@EndTime DATETIME
AS
BEGIN
SET NOCOUNT ON;
WITH cte AS
(
SELECT
-- Adjust each CaptureTime to the beginning of the interval.
DATEADD(SECOND, -DATEDIFF(SECOND, @StartTime , CaptureTime) % @UpdateIntervalSeconds, CaptureTime) as StartCaptureTime,
Result
FROM RawData
WHERE CaptureTime >= @StartTime
)
INSERT INTO AggregateData
SELECT
SUM(Result),
StartCaptureTime,
DATEADD(SECOND, @UpdateIntervalSeconds, StartCaptureTime)
FROM cte
GROUP BY StartCaptureTime
ORDER BY StartCaptureTime;
END
2 Answers 2
There's no need for a cursor or recursion, both are not really performant for larger amounts of data.
Simply adjust the CaptureTime to the start of each interval.
declare @UpdateIntervalSeconds integer = 5;
declare @minTime datetime = (select min(CaptureTime) from RawData);
-- if you want to start at exactly `00` instead of `01`,`02` etc.
-- you can modify `@minTime` using another DATEADD/DATEDIFF
with cte as
( select
-- adjust each CaptureTime to the begin of an interval
DATEADD(SECOND, -DATEDIFF(SECOND, @minTime , CaptureTime) % @UpdateIntervalSeconds, CaptureTime) as StartCaptureTime
,Result
from RawData
)
select StartCaptureTime
,DATEADD(SECOND, @UpdateIntervalSeconds, StartCaptureTime) AS EndCaptureTime
,sum(Result)
from cte
group by StartCaptureTime
order by StartCaptureTime
-
\$\begingroup\$ Very clever! I just had to add a filter to the CTE to make it work with a passed-in Start Time instead processing the entire table at once. Thanks for your help! \$\endgroup\$Jake Reece– Jake Reece2018年04月30日 22:33:50 +00:00Commented Apr 30, 2018 at 22:33
I don't see any problems with the cursor.
I think a recursive cte is more efficient and cleaner here.
INSERT INTO @RawData VALUES
(2.3, '2018-04-01 00:00:00'),
(2.5, '2018-04-01 00:00:01'),
(2.8, '2018-04-01 00:00:02'),
(2.8, '2018-04-01 00:00:03'),
(3.4, '2018-04-01 00:00:04'),
(5.1, '2018-04-01 00:00:05'),
(2.2, '2018-04-01 00:00:06'),
(4.1, '2018-04-01 00:00:07'),
(4.3, '2018-04-01 00:00:08'),
(5.9, '2018-04-01 00:00:09');
declare @start datetime = (select min(CaptureTime) from @RawData);
declare @end datetime = (select max(CaptureTime) from @RawData);
with cte as
( select @start st, DATEADD(ss, 4, @start) as ed
union all
select DATEADD(ss, 5, st), DATEADD(ss, 5, ed)
from cte
where ed < @end
)
select sum(rd.Result), cte.st, cte.ed
from @RawData rd
join cte
on rd.CaptureTime between cte.st and cte.ed
group by cte.st, cte.ed;
00
or at the minimumCaptureTime
? And what are those much larger intervals than 5 seconds? \$\endgroup\$