1
\$\begingroup\$

I have a merge statement that takes around 10 minutes to process 5 million or more records.

The merge statement is part of a stored procedure that takes my newly bulk loaded staging table then runs data integrity checks, transforms the data into the proper format, and inserts that data where it should go.

Here are the tables:

CREATE TABLE [dbo].[OrderFact]
(
 Id BIGINT IDENTITY PRIMARY KEY
 ,CustomerId INT NOT NULL REFERENCES Customer (Id)
 ,BillingProcessorId INT NOT NULL REFERENCES BillingProcessor (Id)
 ,Quantity INT NOT NULL
 ,OrderStatus TINYINT NULL
 ,Cost MONEY NOT NULL
 ,AdjustedCost MONEY NOT NULL
 ,CreatedDate DATE NOT NULL DEFAULT GETUTCDATE()
 ,UpdatedDate DATE NOT NULL DEFAULT GETUTCDATE()
 ,IsDelete BIT NOT NULL DEFAULT (0)
);
GO
CREATE TABLE [Staging].[VendorAOrder]
(
 OrderId INT
 ,Quantity INT NULL
 ,Cost MONEY NULL
 ,OrderStatus TINYINT NULL
 ,SellDate DATETIME2 NULL
 ,CustomerId NVARCHAR(20) NULL
 ,VendorPrefix NVARCHAR(20) NULL
 ,DataSetId INT NULL
)
GO
CREATE TABLE [Maps].[VendorAOrder]
(
 Id BIGINT IDENTITY PRIAMRY KEY
 ,OrderFactId BIGINT NOT NULL REFERENCES OrderFact (Id)
 ,CreatedDate DATE NOT NULL DEFAULT GETUTCDATE()
 ,UpdatedDate DATE NOT NULL DEFAULT GETUTCDATE()
 ,IsDelete BIT NOT NULL DEFAULT (0)
);
GO
CREATE TABLE [Maps].[VendorARelatedOrder]
(
 Id INT IDENTITY PRIMARY KEY
 ,VendorPrefix NVARCHAR(20) NOT NULL
 ,DataSetId INT NOT NULL REFERENCES DataSet (Id)
);
GO
CREATE TABLE [Maps].[VendorACustomer]
(
 Id INT IDENTITY PRIMARY KEY
 ,CustomerId INT NOT NULL REFERENCES Customer (Id)
 ,OtherCustomerId NVARCHAR(20) NOT NULL
 ,CreatedDate DATE NOT NULL DEFAULT GETUTCDATE()
 ,UpdatedDate DATE NOT NULL DEFAULT GETUTCDATE()
 ,IsDelete BIT NOT NULL DEFAULT (0)
);
GO

Here is the merge statement:

--In stored procedure
CREATE TABLE #OrderMaps
(
 Id INT IDENTITY PRIMARY KEY
 ,OrderFactId BIGINT NOT NULL
 ,OrderId INT NOT NULL
 ,RelatedOrderId INT NOT NULL
)
-- Merge statement
-- Gets all non duplicate orders and inserts them into OrderFact
-- Outputs OrderId and new OrderFactId into temp table
-- Which is later inserted into [Maps].[VendorAOrder]
MERGE [OrderFact] AS [Target]
USING
(
 SELECT
 ,mc.CustomerId
 ,b.Id AS BillingProcessorId
 ,o.OrderId
 ,ro.Id AS RelatedOrderId
 ,o.Quantity
 ,o.OrderStatus
 ,o.Cost
 ,AdjustedCost = CASE WHEN OrderStatus=1 THEN -Cost ELSE Cost END
 ,o.SellDate
 FROM
 (
 SELECT *
 FROM [Staging].[VendorAOrder] o
 WHERE NOT EXISTS
 (
 SELECT 1
 FROM [Maps].[VendorAOrder] orders
 JOIN OrderFact of
 ON order.OrderFactId = of.Id
 AND orders.OrderId = o.OrderId
 AND of.DataSetId = o.DataSetId
 AND of.IsDelete = 0
 )
 ) o
 JOIN Maps.VendorARelatedOrder ro
 ON or.VendorPrefix = o.VendorPrefix
 AND or.DataSetId = o.DataSetId
 JOIN BillingProcessor b
 ON b.Id = o.BillingProcessorId
 JOIN [Maps].VendorACustomer mc
 ON mc.OtherCustomerId = o.CustomerId
 AND mc.VendorId = o.VendorId
) AS [Source]
ON 1 = 0
WHEN NOT MATCHED BY TARGET THEN
 INSERT
 (
 CustomerId
 ,BillingProcessorId
 ,Quantity
 ,OrderStatus
 ,Cost
 ,AdjustedCost
 ,SellDate
 )
 VALUES
 (
 [Source].CustomerId
 ,[Source].BillingProcessorId
 ,[Source].Quantity
 ,[Source].OrderStatus
 ,[Source].Cost
 ,[Source].AdjustedCost
 ,[Source].SellDate
 )
 OUTPUT (Inserted.Id, OrderId, RelatedOrderId)
 INTO #OrderMaps(OrderFactId, OrderId, RelatedOrderId)
; -- End merge

Some things that I have tried that partially improved performance:

Create a temp table to pre filter orders

CREATE TABLE #DistinctOrders
(
 Id INT IDENTITY PRIMARY KEY
 ,OrderId NOT INT
 ,Quantity INT NOT NULL
 ,Cost MONEY NOT NULL
 ,OrderStatus TINYINT NOT NULL
 ,SellDate DATETIME2 NOT NULL
 ,CustomerId NVARCHAR(20) NOT NULL
 ,VendorPrefix NVARCHAR(20) NOT NULL
 ,DataSetId INT NOT NULL
 ,RelatedOrderId INT NOT NULL
);
GO

I could insert the orders into this table before I used the Merge Statement, but I am not sure of its performance increase.

Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Apr 9, 2013 at 15:53
\$\endgroup\$
1
  • 1
    \$\begingroup\$ Why is your merge condition ON 1 = 0? It would help if you post index DDL as well as the execution plan for this query. And why do you say that you "tried" using a temp table but you're "not sure of its performance increase"? If you tried it, then presumably you know what the result was? \$\endgroup\$ Commented Apr 17, 2013 at 21:34

1 Answer 1

2
\$\begingroup\$

Your Staging Table doesn't have a Primary Key in it, if it did that might speed things up a little bit, you gave your Temporary table a Primary key but not your Permanent Staging Table.


Another thing that I noticed is that inside of your Using you use

SELECT *
 FROM [Staging].[VendorAOrder] o
 WHERE NOT EXISTS
 (
 SELECT 1
 FROM [Maps].[VendorAOrder] orders
 JOIN OrderFact of
 ON order.OrderFactId = of.Id
 AND orders.OrderId = o.OrderId
 AND of.DataSetId = o.DataSetId
 AND of.IsDelete = 0
 )

and then further down the line you use

ON 1 = 0
WHEN NOT MATCHED BY TARGET THEN

could you make these both Agreeable?

I am thinking that the NOT MATCHED and the NOT EXIST is causing some slowness in your Merge.

I know that using <> or NOT IN can slow down a Query, could you possibly change these both to Positives instead of Negatives?

answered Nov 1, 2013 at 19:06
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.