4

I am working on creating a Datawarehouse. I have created a Time Dimension (Dim_Time), at 5-minute intervals. Hour Aggregations will have [Minutes] = NULL. For the purpose of this example:

CREATE TABLE [dbo].[Dim_Time](
 [TimeID] [int] IDENTITY(1,1) NOT NULL,
 [StartDateTime] [datetime] NULL,
 [Hour] [int] NULL,
 [Minute] [int] NULL,
 CONSTRAINT [PK_Dim_Time] PRIMARY KEY CLUSTERED 
 ([TimeID] ASC)
 ) ON [PRIMARY]
GO

Then I have an Incoming Table, which is updated every 5 minutes from the OLTP Database.

CREATE TABLE [dbo].[Stg_IncomingQueue](
 [IncomingID] [int] IDENTITY(1,1) NOT NULL,
 [CustomerID] [int] NOT NULL,
 [TimeID] [int] NULL,
 [InsertTime] [datetime] NULL,
 CONSTRAINT [PK_IncomingQueueMonitor] PRIMARY KEY CLUSTERED 
([IncomingID] ASC)
) ON [PRIMARY]
GO

I then have the following While loop. It's purpose is to get the correct 5-Minute time slot (TimeID) that relates to a particular incoming row:

WHILE 0 < (SELECT COUNT(*) FROM [dba_local].[dbo].[Stg_IncomingQueue] WHERE TimeID IS NULL)
BEGIN
 SELECT TOP 1 @IncomingID = IncomingID, @RowInserTime = InsertTime 
 FROM [dba_local].[dbo].[Stg_IncomingQueue] WHERE TimeID IS NULL
 ;WITH DimTime
 AS (
 SELECT MAX(TimeID) AS MaxTimeID FROM [dba_local].[dbo].[Dim_Time]
 WHERE StartDateTime < @RowInserTime AND [Minute] IS NOT NULL
 )
 UPDATE [dba_local].[dbo].[Stg_IncomingQueue]
 SET TimeID = (SELECT MaxTimeID FROM DimTime)
 WHERE IncomingID = @IncomingID
END

It's such a simple process, and yet I cannot figure out a simpler way to update the TimeID. As per the CTE SELECT in the loop, I need to get the MAX(TimeID) where the StartDateTime is less then the rows InsertTime. Because time is the only relationship, I am struggling with all options to do this in 1 query without the loop, but I feel it is possible

Please can someone help me out here either with a better option or confirming that this is the simplest way.

Thank you very much for your time and assistance. Wade

Aaron Bertrand
182k28 gold badges407 silver badges626 bronze badges
asked Feb 27, 2019 at 18:35
0

1 Answer 1

7

I created the following minimally complete and verifiable example, based on the two tables in your original question. It uses the LEAD T-SQL statement to obtain a time range from the dbo.Dim_Time table, which can be compared to the incoming rows quite easily.

IF OBJECT_ID(N'dbo.Stg_IncomingQueue', N'U') IS NOT NULL
DROP TABLE dbo.Stg_IncomingQueue;
IF OBJECT_ID(N'dbo.Dim_Time', N'U') IS NOT NULL
DROP TABLE dbo.Dim_time;
CREATE TABLE dbo.Dim_Time(
 TimeID int IDENTITY(1,1) NOT NULL,
 StartDateTime time(0) NULL,
 CONSTRAINT PK_Dim_Time PRIMARY KEY CLUSTERED 
 (TimeID ASC)
 ) ON [PRIMARY]
GO
;WITH src AS
(
 SELECT TOP (10) sv.number
 FROM master.dbo.spt_values sv
 WHERE sv.type = N'P'
 ORDER BY sv.number
)
INSERT INTO dbo.Dim_Time (StartDateTime)
SELECT TOP(289) CONVERT(time(0), DATEADD(minute, (s3.number * 100 + s2.number * 10 + s1.number) * 5, CONVERT(time(0), '00:00:00')))
FROM src s1
 CROSS JOIN src s2
 CROSS JOIN src s3
ORDER BY s3.number * 100 + s2.number * 10 + s1.number
CREATE TABLE dbo.Stg_IncomingQueue(
 IncomingID int IDENTITY(1,1) NOT NULL,
 CustomerID int NOT NULL,
 TimeID int NULL,
 InsertTime datetime NULL,
 CONSTRAINT PK_IncomingQueueMonitor PRIMARY KEY CLUSTERED 
(IncomingID ASC)
) ON [PRIMARY]
GO
INSERT INTO dbo.Stg_IncomingQueue (CustomerID, InsertTime)
VALUES (1, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
 , (2, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
 , (3, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
 , (4, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'));

This piece replaces your entire WHILE loop, with a single UPDATE statement, which is both more efficient, and easier to understand.

UPDATE dbo.Stg_IncomingQueue
SET TimeID = t.TimeID
FROM dbo.Stg_IncomingQueue iq
 INNER JOIN (
 SELECT dt.TimeID
 , dt.StartDateTime
 , EndDateTime = LEAD(dt.StartDateTime, 1) OVER (ORDER BY dt.StartDateTime)
 FROM dbo.Dim_Time dt 
 ) t ON CONVERT(time(0), iq.InsertTime) >= t.StartDateTime AND CONVERT(time(0), iq.InsertTime) < t.EndDateTime;

The results, compared side-by-side with the Dim_Time table:

SELECT *
FROM dbo.Stg_IncomingQueue iq
 INNER JOIN dbo.Dim_Time dt ON iq.TimeID = dt.TimeID;

The output looks like:

╔════════════╦════════════╦════════╦═════════════════════════╦════════╦═══════════════╗
║ IncomingID ║ CustomerID ║ TimeID ║ InsertTime ║ TimeID ║ StartDateTime ║
╠════════════╬════════════╬════════╬═════════════════════════╬════════╬═══════════════╣
║ 1 ║ 1 ║ 271 ║ 1875年06月30日 22:31:49.000 ║ 271 ║ 22:30:00 ║
║ 2 ║ 2 ║ 116 ║ 1857年07月01日 09:38:59.000 ║ 116 ║ 09:35:00 ║
║ 3 ║ 3 ║ 218 ║ 1854年09月18日 18:08:39.000 ║ 218 ║ 18:05:00 ║
║ 4 ║ 4 ║ 221 ║ 1860年05月31日 18:22:25.000 ║ 221 ║ 18:20:00 ║
╚════════════╩════════════╩════════╩═════════════════════════╩════════╩═══════════════╝

Assuming there isn't a massive amount of incoming rows, this may work fairly well. Be aware, I'm using CONVERT() to convert the incoming datetime column into a time(0) value, which comes at a cost of the query optimizer not being able to use available statistics to help create a great plan. The "actual" query plan for the insert statement shows this warning:

Type conversion in expression (CONVERT(time(0),[iq].[InsertTime],0)>=[dt].[StartDateTime]) may affect "SeekPlan" in query plan choice, Type conversion in expression (CONVERT(time(0),[iq].[InsertTime],0)<[Expr1002]) may affect "SeekPlan" in query plan choice.

If you need to avoid the type-conversion during the update, you can move that workload to the insert operation by updating the definition of dbo.Stg_IncomingQueue to include a persisted computed column, as in:

CREATE TABLE dbo.Stg_IncomingQueue(
 IncomingID int IDENTITY(1,1) NOT NULL,
 CustomerID int NOT NULL,
 TimeID int NULL,
 InsertTime datetime NULL,
 InsertTime0 AS CONVERT(TIME(0), InsertTime) PERSISTED
 CONSTRAINT PK_IncomingQueueMonitor PRIMARY KEY CLUSTERED 
(IncomingID ASC)
) ON [PRIMARY]
GO

The update statement then becomes:

UPDATE dbo.Stg_IncomingQueue
SET TimeID = t.TimeID
FROM dbo.Stg_IncomingQueue iq
 INNER JOIN (
 SELECT dt.TimeID
 , dt.StartDateTime
 , EndDateTime = LEAD(dt.StartDateTime, 1) OVER (ORDER BY dt.StartDateTime)
 FROM dbo.Dim_Time dt 
 ) t ON iq.InsertTime0 >= t.StartDateTime AND iq.InsertTime0 < t.EndDateTime;
answered Feb 27, 2019 at 19:06
0

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.