I have a data set that appears like this:
next_generation_date | procedure | interval | frequency |
---|---|---|---|
2021年01月17日 00:00:00.000 | Clean Restroom | 1 | day |
2021年01月17日 00:00:00.000 | Vacuum | 2 | week |
2021年02月01日 00:00:00.000 | Inspect Fire Extinguisher | 3 | month |
2021年10月01日 00:00:00.000 | Test Generator | 4 | year |
My goal is to generate multiple date rows for each procedure by going off of the next_generation_date, interval and frequency columns until a specified end date. This specified end date would be the same date throughout the entire table. In this example, let's make the specified end date 2025年12月31日. My end goal is for the table to appear similar to the below. Please note since this is an example, I didn't include every row as to avoid having hundreds of rows listed below.
next_generation_date | procedure | interval | frequency |
---|---|---|---|
2021年01月17日 00:00:00.000 | Clean Restroom | 1 | day |
2021年01月18日 00:00:00.000 | Clean Restroom | 1 | day |
2021年01月19日 00:00:00.000 | Clean Restroom | 1 | day |
2021年01月20日 00:00:00.000 | Clean Restroom | 1 | day |
2021年01月17日 00:00:00.000 | Vacuum | 2 | week |
2021年01月31日 00:00:00.000 | Vacuum | 2 | week |
2021年02月14日 00:00:00.000 | Vacuum | 2 | week |
2021年02月28日 00:00:00.000 | Vacuum | 2 | week |
2021年02月01日 00:00:00.000 | Inspect Fire Extinguisher | 3 | month |
2021年05月01日 00:00:00.000 | Inspect Fire Extinguisher | 3 | month |
2021年08月01日 00:00:00.000 | Inspect Fire Extinguisher | 3 | month |
2021年11月01日 00:00:00.000 | Inspect Fire Extinguisher | 3 | month |
2021年10月01日 00:00:00.000 | Test Generator | 4 | year |
2025年10月01日 00:00:00.000 | Test Generator | 4 | year |
To summarize the above table, future "Clean Restroom" dates are each day, "Vacuum" dates are every two weeks, "Inspect Fire Extinguisher" dates are every three months, and "Test Generator" dates are every four years.
Below are two of my most recent attempts in Microsoft SQL Server Management Studio. The first attempt creates the loop but my procedures only increase by an interval of 1, not the unique interval found within the interval column. My second attempt generates the next next_generation_date with the correct interval but doesn't loop.
CREATE TABLE ##many_integers (idx INT);
WITH units(units) AS (
SELECT 0 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION
SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9
)
,tens AS(SELECT units * 10 AS tens FROM units )
,hundreds AS(SELECT tens * 10 AS hundreds FROM tens )
,thousands AS(SELECT hundreds * 10 AS thousands FROM hundreds )
,tenthousands AS(SELECT thousands * 10 AS tenthousands FROM thousands )
,hundredthousands AS(SELECT tenthousands * 10 AS hundredthousands FROM tenthousands)
INSERT
INTO ##many_integers
SELECT hundredthousands +tenthousands +thousands +hundreds +tens +units
FROM units
CROSS JOIN tens
CROSS JOIN hundreds
CROSS JOIN thousands
CROSS JOIN tenthousands
CROSS JOIN hundredthousands ;
SELECT [procedure], [frequency], [interval],
CASE [frequency]
WHEN 'day' THEN DATEADD( day , idx , [next_generation_date])
WHEN 'week' THEN DATEADD( week , idx , [next_generation_date])
WHEN 'month' THEN DATEADD( month , idx , [next_generation_date])
WHEN 'year' THEN DATEADD( year , idx , [next_generation_date])
ELSE NULL
END AS [next_generation_date]
FROM [data].[fact] CROSS JOIN ##many_integers
WHERE idx < 5000
AND
CASE [frequency]
WHEN 'day' THEN DATEADD( day , idx , [next_generation_date])
WHEN 'week' THEN DATEADD( week , idx , [next_generation_date])
WHEN 'month' THEN DATEADD( month , idx , [next_generation_date])
WHEN 'year' THEN DATEADD( year , idx , [next_generation_date])
ELSE NULL
END <= '2023-12-31 00:00:00'
ORDER BY 1
;
SELECT [procedure], [frequency], [interval], [next_generation_date] FROM [data].[fact]
UNION ALL
SELECT [procedure], [frequency], [interval],
CASE
WHEN [frequency] = 'day' THEN DATEADD(day, [interval], [next_generation_date])
WHEN [frequency] = 'week' THEN DATEADD(week, [interval], [next_generation_date])
WHEN [frequency] = 'month' THEN DATEADD(month, [interval], [next_generation_date])
WHEN [frequency] = 'year' THEN DATEADD(year, [interval], [next_generation_date])
ELSE NULL
END AS [next_generation_date]
FROM [data].[fact]
ORDER BY 1;
Any and all suggestions are greatly appreciated as I'm new to SQL. Thank you.
-
What a great first question. Well done!Zikato– Zikato2022年05月26日 12:16:27 +00:00Commented May 26, 2022 at 12:16
1 Answer 1
I recommend using a Calendar table. Aaron Bertand has a blog series about them
For this code sample, I'll be using my calendar table. It's quite a bit of code so this is a link to my GitHub
For the solution, I'll narrow the range from start to end. To calculate the interval I'll use the helper offset columns.
Current_offset - starting_offset % interval = 0
That's a logic shared for all frequency types. But some frequencies require additional conditions. For example with the WEEK frequency - the offset applies to a whole week, but we want to limit it to the same weekday (e.g. Sunday with Sunday). Similar for the other frequencies.
Complete solution (relies on the existence of the Calendar table)
DROP TABLE IF EXISTS #RegularTask
CREATE TABLE #RegularTask
(
StartDate date
, EndDate date
, RegularProcedure varchar(50)
, interval tinyint
, frequency varchar(10)
)
INSERT INTO #RegularTask (StartDate, EndDate, RegularProcedure, interval, frequency)
VALUES
(DATEFROMPARTS(2021,10,01), DATEFROMPARTS(2025,12,31), 'Test Generator', 4, 'YEAR')
, (DATEFROMPARTS(2021,02,01), DATEFROMPARTS(2025,12,31), 'Inspect Fire Extinguisher', 3, 'MONTH')
, (DATEFROMPARTS(2021,01,17), DATEFROMPARTS(2025,12,31), 'Vacuum', 2, 'WEEK')
, (DATEFROMPARTS(2021,01,17), DATEFROMPARTS(2025,12,31), 'Clean Restroom', 1, 'DAY')
;WITH startPoint
AS
(
SELECT
rt.StartDate
, rt.EndDate
, rt.RegularProcedure
, rt.interval
, rt.frequency
, c.YearNum
, c.MonthNum
, c.DayOfMonthNum
, c.DayOfWeekNum
, c.DayOffset
, c.WeekOffset
, c.MonthOffset
FROM #RegularTask AS rt
JOIN dbo.Calendar AS c
ON rt.StartDate = c.DateVal
)
SELECT
c.DateVal
, sp.RegularProcedure
, sp.interval
, sp.frequency
FROM dbo.Calendar AS c
JOIN startPoint AS sp
ON c.DateVal >= sp.StartDate
AND c.DateVal <= sp.EndDate
AND
(
(
sp.frequency = 'DAY'
AND (c.DayOffset - sp.DayOffset) % sp.interval = 0
)
OR
(
sp.frequency = 'WEEK'
AND (c.WeekOffset - sp.WeekOffset) % sp.interval = 0
AND c.DayOfWeekNum = sp.DayOfWeekNum
)
OR
(
sp.frequency = 'MONTH'
AND (c.MonthOffset - sp.MonthOffset) % sp.interval = 0
AND c.DayOfMonthNum = sp.DayOfMonthNum
)
OR
(
sp.frequency = 'YEAR'
AND (c.YearNum - sp.YearNum) % sp.interval = 0
AND c.MonthNum = sp.MonthNum
AND c.DayOfMonthNum = sp.DayOfMonthNum
)
)
Explore related questions
See similar questions with these tags.