I have a merge statement that takes around 10 minutes to process 5 million or more records.
The merge statement is part of a stored procedure that takes my newly bulk loaded staging table then runs data integrity checks, transforms the data into the proper format, and inserts that data where it should go.
Here are the tables:
CREATE TABLE [dbo].[OrderFact]
(
Id BIGINT IDENTITY PRIMARY KEY
,CustomerId INT NOT NULL REFERENCES Customer (Id)
,BillingProcessorId INT NOT NULL REFERENCES BillingProcessor (Id)
,Quantity INT NOT NULL
,OrderStatus TINYINT NULL
,Cost MONEY NOT NULL
,AdjustedCost MONEY NOT NULL
,CreatedDate DATE NOT NULL DEFAULT GETUTCDATE()
,UpdatedDate DATE NOT NULL DEFAULT GETUTCDATE()
,IsDelete BIT NOT NULL DEFAULT (0)
);
GO
CREATE TABLE [Staging].[VendorAOrder]
(
OrderId INT
,Quantity INT NULL
,Cost MONEY NULL
,OrderStatus TINYINT NULL
,SellDate DATETIME2 NULL
,CustomerId NVARCHAR(20) NULL
,VendorPrefix NVARCHAR(20) NULL
,DataSetId INT NULL
)
GO
CREATE TABLE [Maps].[VendorAOrder]
(
Id BIGINT IDENTITY PRIAMRY KEY
,OrderFactId BIGINT NOT NULL REFERENCES OrderFact (Id)
,CreatedDate DATE NOT NULL DEFAULT GETUTCDATE()
,UpdatedDate DATE NOT NULL DEFAULT GETUTCDATE()
,IsDelete BIT NOT NULL DEFAULT (0)
);
GO
CREATE TABLE [Maps].[VendorARelatedOrder]
(
Id INT IDENTITY PRIMARY KEY
,VendorPrefix NVARCHAR(20) NOT NULL
,DataSetId INT NOT NULL REFERENCES DataSet (Id)
);
GO
CREATE TABLE [Maps].[VendorACustomer]
(
Id INT IDENTITY PRIMARY KEY
,CustomerId INT NOT NULL REFERENCES Customer (Id)
,OtherCustomerId NVARCHAR(20) NOT NULL
,CreatedDate DATE NOT NULL DEFAULT GETUTCDATE()
,UpdatedDate DATE NOT NULL DEFAULT GETUTCDATE()
,IsDelete BIT NOT NULL DEFAULT (0)
);
GO
Here is the merge statement:
--In stored procedure
CREATE TABLE #OrderMaps
(
Id INT IDENTITY PRIMARY KEY
,OrderFactId BIGINT NOT NULL
,OrderId INT NOT NULL
,RelatedOrderId INT NOT NULL
)
-- Merge statement
-- Gets all non duplicate orders and inserts them into OrderFact
-- Outputs OrderId and new OrderFactId into temp table
-- Which is later inserted into [Maps].[VendorAOrder]
MERGE [OrderFact] AS [Target]
USING
(
SELECT
,mc.CustomerId
,b.Id AS BillingProcessorId
,o.OrderId
,ro.Id AS RelatedOrderId
,o.Quantity
,o.OrderStatus
,o.Cost
,AdjustedCost = CASE WHEN OrderStatus=1 THEN -Cost ELSE Cost END
,o.SellDate
FROM
(
SELECT *
FROM [Staging].[VendorAOrder] o
WHERE NOT EXISTS
(
SELECT 1
FROM [Maps].[VendorAOrder] orders
JOIN OrderFact of
ON order.OrderFactId = of.Id
AND orders.OrderId = o.OrderId
AND of.DataSetId = o.DataSetId
AND of.IsDelete = 0
)
) o
JOIN Maps.VendorARelatedOrder ro
ON or.VendorPrefix = o.VendorPrefix
AND or.DataSetId = o.DataSetId
JOIN BillingProcessor b
ON b.Id = o.BillingProcessorId
JOIN [Maps].VendorACustomer mc
ON mc.OtherCustomerId = o.CustomerId
AND mc.VendorId = o.VendorId
) AS [Source]
ON 1 = 0
WHEN NOT MATCHED BY TARGET THEN
INSERT
(
CustomerId
,BillingProcessorId
,Quantity
,OrderStatus
,Cost
,AdjustedCost
,SellDate
)
VALUES
(
[Source].CustomerId
,[Source].BillingProcessorId
,[Source].Quantity
,[Source].OrderStatus
,[Source].Cost
,[Source].AdjustedCost
,[Source].SellDate
)
OUTPUT (Inserted.Id, OrderId, RelatedOrderId)
INTO #OrderMaps(OrderFactId, OrderId, RelatedOrderId)
; -- End merge
Some things that I have tried that partially improved performance:
Create a temp table to pre filter orders
CREATE TABLE #DistinctOrders
(
Id INT IDENTITY PRIMARY KEY
,OrderId NOT INT
,Quantity INT NOT NULL
,Cost MONEY NOT NULL
,OrderStatus TINYINT NOT NULL
,SellDate DATETIME2 NOT NULL
,CustomerId NVARCHAR(20) NOT NULL
,VendorPrefix NVARCHAR(20) NOT NULL
,DataSetId INT NOT NULL
,RelatedOrderId INT NOT NULL
);
GO
I could insert the orders into this table before I used the Merge Statement, but I am not sure of its performance increase.
1 Answer 1
Your Staging
Table doesn't have a Primary Key in it, if it did that might speed things up a little bit, you gave your Temporary table a Primary key but not your Permanent Staging
Table.
Another thing that I noticed is that inside of your Using
you use
SELECT *
FROM [Staging].[VendorAOrder] o
WHERE NOT EXISTS
(
SELECT 1
FROM [Maps].[VendorAOrder] orders
JOIN OrderFact of
ON order.OrderFactId = of.Id
AND orders.OrderId = o.OrderId
AND of.DataSetId = o.DataSetId
AND of.IsDelete = 0
)
and then further down the line you use
ON 1 = 0
WHEN NOT MATCHED BY TARGET THEN
could you make these both Agreeable?
I am thinking that the NOT MATCHED
and the NOT EXIST
is causing some slowness in your Merge.
I know that using <>
or NOT IN
can slow down a Query, could you possibly change these both to Positives instead of Negatives?
ON 1 = 0
? It would help if you post index DDL as well as the execution plan for this query. And why do you say that you "tried" using a temp table but you're "not sure of its performance increase"? If you tried it, then presumably you know what the result was? \$\endgroup\$