I am working on creating a Datawarehouse. I have created a Time Dimension (Dim_Time), at 5-minute intervals. Hour Aggregations will have [Minutes] = NULL. For the purpose of this example:
CREATE TABLE [dbo].[Dim_Time](
[TimeID] [int] IDENTITY(1,1) NOT NULL,
[StartDateTime] [datetime] NULL,
[Hour] [int] NULL,
[Minute] [int] NULL,
CONSTRAINT [PK_Dim_Time] PRIMARY KEY CLUSTERED
([TimeID] ASC)
) ON [PRIMARY]
GO
Then I have an Incoming Table, which is updated every 5 minutes from the OLTP Database.
CREATE TABLE [dbo].[Stg_IncomingQueue](
[IncomingID] [int] IDENTITY(1,1) NOT NULL,
[CustomerID] [int] NOT NULL,
[TimeID] [int] NULL,
[InsertTime] [datetime] NULL,
CONSTRAINT [PK_IncomingQueueMonitor] PRIMARY KEY CLUSTERED
([IncomingID] ASC)
) ON [PRIMARY]
GO
I then have the following While loop. It's purpose is to get the correct 5-Minute time slot (TimeID) that relates to a particular incoming row:
WHILE 0 < (SELECT COUNT(*) FROM [dba_local].[dbo].[Stg_IncomingQueue] WHERE TimeID IS NULL)
BEGIN
SELECT TOP 1 @IncomingID = IncomingID, @RowInserTime = InsertTime
FROM [dba_local].[dbo].[Stg_IncomingQueue] WHERE TimeID IS NULL
;WITH DimTime
AS (
SELECT MAX(TimeID) AS MaxTimeID FROM [dba_local].[dbo].[Dim_Time]
WHERE StartDateTime < @RowInserTime AND [Minute] IS NOT NULL
)
UPDATE [dba_local].[dbo].[Stg_IncomingQueue]
SET TimeID = (SELECT MaxTimeID FROM DimTime)
WHERE IncomingID = @IncomingID
END
It's such a simple process, and yet I cannot figure out a simpler way to update the TimeID. As per the CTE SELECT in the loop, I need to get the MAX(TimeID) where the StartDateTime is less then the rows InsertTime. Because time is the only relationship, I am struggling with all options to do this in 1 query without the loop, but I feel it is possible
Please can someone help me out here either with a better option or confirming that this is the simplest way.
Thank you very much for your time and assistance. Wade
1 Answer 1
I created the following minimally complete and verifiable example, based on the two tables in your original question. It uses the LEAD T-SQL statement to obtain a time range from the dbo.Dim_Time table, which can be compared to the incoming rows quite easily.
IF OBJECT_ID(N'dbo.Stg_IncomingQueue', N'U') IS NOT NULL
DROP TABLE dbo.Stg_IncomingQueue;
IF OBJECT_ID(N'dbo.Dim_Time', N'U') IS NOT NULL
DROP TABLE dbo.Dim_time;
CREATE TABLE dbo.Dim_Time(
TimeID int IDENTITY(1,1) NOT NULL,
StartDateTime time(0) NULL,
CONSTRAINT PK_Dim_Time PRIMARY KEY CLUSTERED
(TimeID ASC)
) ON [PRIMARY]
GO
;WITH src AS
(
SELECT TOP (10) sv.number
FROM master.dbo.spt_values sv
WHERE sv.type = N'P'
ORDER BY sv.number
)
INSERT INTO dbo.Dim_Time (StartDateTime)
SELECT TOP(289) CONVERT(time(0), DATEADD(minute, (s3.number * 100 + s2.number * 10 + s1.number) * 5, CONVERT(time(0), '00:00:00')))
FROM src s1
CROSS JOIN src s2
CROSS JOIN src s3
ORDER BY s3.number * 100 + s2.number * 10 + s1.number
CREATE TABLE dbo.Stg_IncomingQueue(
IncomingID int IDENTITY(1,1) NOT NULL,
CustomerID int NOT NULL,
TimeID int NULL,
InsertTime datetime NULL,
CONSTRAINT PK_IncomingQueueMonitor PRIMARY KEY CLUSTERED
(IncomingID ASC)
) ON [PRIMARY]
GO
INSERT INTO dbo.Stg_IncomingQueue (CustomerID, InsertTime)
VALUES (1, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
, (2, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
, (3, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
, (4, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'));
This piece replaces your entire WHILE
loop, with a single UPDATE
statement, which is both more efficient, and easier to understand.
UPDATE dbo.Stg_IncomingQueue
SET TimeID = t.TimeID
FROM dbo.Stg_IncomingQueue iq
INNER JOIN (
SELECT dt.TimeID
, dt.StartDateTime
, EndDateTime = LEAD(dt.StartDateTime, 1) OVER (ORDER BY dt.StartDateTime)
FROM dbo.Dim_Time dt
) t ON CONVERT(time(0), iq.InsertTime) >= t.StartDateTime AND CONVERT(time(0), iq.InsertTime) < t.EndDateTime;
The results, compared side-by-side with the Dim_Time table:
SELECT *
FROM dbo.Stg_IncomingQueue iq
INNER JOIN dbo.Dim_Time dt ON iq.TimeID = dt.TimeID;
The output looks like:
╔════════════╦════════════╦════════╦═════════════════════════╦════════╦═══════════════╗ ║ IncomingID ║ CustomerID ║ TimeID ║ InsertTime ║ TimeID ║ StartDateTime ║ ╠════════════╬════════════╬════════╬═════════════════════════╬════════╬═══════════════╣ ║ 1 ║ 1 ║ 271 ║ 1875年06月30日 22:31:49.000 ║ 271 ║ 22:30:00 ║ ║ 2 ║ 2 ║ 116 ║ 1857年07月01日 09:38:59.000 ║ 116 ║ 09:35:00 ║ ║ 3 ║ 3 ║ 218 ║ 1854年09月18日 18:08:39.000 ║ 218 ║ 18:05:00 ║ ║ 4 ║ 4 ║ 221 ║ 1860年05月31日 18:22:25.000 ║ 221 ║ 18:20:00 ║ ╚════════════╩════════════╩════════╩═════════════════════════╩════════╩═══════════════╝
Assuming there isn't a massive amount of incoming rows, this may work fairly well. Be aware, I'm using CONVERT()
to convert the incoming datetime
column into a time(0)
value, which comes at a cost of the query optimizer not being able to use available statistics to help create a great plan. The "actual" query plan for the insert statement shows this warning:
Type conversion in expression (CONVERT(time(0),[iq].[InsertTime],0)>=[dt].[StartDateTime]) may affect "SeekPlan" in query plan choice, Type conversion in expression (CONVERT(time(0),[iq].[InsertTime],0)<[Expr1002]) may affect "SeekPlan" in query plan choice.
If you need to avoid the type-conversion during the update, you can move that workload to the insert operation by updating the definition of dbo.Stg_IncomingQueue
to include a persisted computed column, as in:
CREATE TABLE dbo.Stg_IncomingQueue(
IncomingID int IDENTITY(1,1) NOT NULL,
CustomerID int NOT NULL,
TimeID int NULL,
InsertTime datetime NULL,
InsertTime0 AS CONVERT(TIME(0), InsertTime) PERSISTED
CONSTRAINT PK_IncomingQueueMonitor PRIMARY KEY CLUSTERED
(IncomingID ASC)
) ON [PRIMARY]
GO
The update statement then becomes:
UPDATE dbo.Stg_IncomingQueue
SET TimeID = t.TimeID
FROM dbo.Stg_IncomingQueue iq
INNER JOIN (
SELECT dt.TimeID
, dt.StartDateTime
, EndDateTime = LEAD(dt.StartDateTime, 1) OVER (ORDER BY dt.StartDateTime)
FROM dbo.Dim_Time dt
) t ON iq.InsertTime0 >= t.StartDateTime AND iq.InsertTime0 < t.EndDateTime;