Merge table stored procedure

Question 1

I wrote a stored procedure that will do a merge of two tables:

CREATE PROCEDURE [dbo].[merge_tables] 
-- Add the parameters for the stored procedure here
@SourceTable varchar(50), 
@DestinationTable varchar(50),
@PrimaryKey varchar(50)
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
DECLARE @update_query varchar(max) = 
 (select Concat('SET ', string_agg(cast(@DestinationTable + '.[' + name + '] = '+ @SourceTable +'.[' + name +']' as varchar(max)),','))
 from sys.columns 
 WHERE object_id = OBJECT_ID(@DestinationTable) and name != @PrimaryKey and generated_always_type = 0 and system_type_id != 189 and is_identity = 0);
DECLARE @insert_query varchar(max) = (select Concat('([', string_agg(cast(name as varchar(max)),'],['), '])', ' VALUES ([', string_agg(cast(name as varchar(max)),'],['), '])')
 from sys.columns 
 WHERE object_id = OBJECT_ID(@DestinationTable) and generated_always_type = 0 and is_identity = 0 and system_type_id != 189);
DECLARE @merge_query varchar(max) = 'MERGE ' + @DestinationTable +
' USING ' + @SourceTable +
' ON (' + @SourceTable + '.' + @PrimaryKey + ' = ' + @DestinationTable + '.' + @PrimaryKey + ')' +
' WHEN MATCHED THEN UPDATE ' + @update_query +
' WHEN NOT MATCHED BY TARGET THEN INSERT ' + @insert_query +
' WHEN NOT MATCHED BY SOURCE THEN DELETE;';
select @merge_query;
EXEC(@merge_query)
END
GO

It appears to work. Is there any way in which this code can be improved? Also, I am not sure how to make this work with compound keys, but that might be off topic for this site.

Question 2

I think you need more detail than just merge two tables.

Question 3

Is there any way in which this code can be improved?

The procedure parameters should reflect the underlying data types. Each component of an identifier is sysname, a synonym for nvarchar(128) NOT NULL. Using varchar(50) artificially limits the length, and characters that might be used. Likewise, Unicode should be used for the dynamic SQL itself.

The user-supplied table identifier strings should be split using PARSENAME into schema and object portions, then quoted using QUOTENAME. This is safer and more correct than manually surrounding the whole identifier with square brackets.

QUOTENAME returns nvarchar(258), so a safe type for the input is twice this, plus one for the . separator, giving nvarchar(517).

Add a @Debug bit parameter, or similar, so the user can view the generated dynamic SQL (and the associated execution plan) before deciding to execute it.

The procedure should have a basic error handling framework, for example as described by Erland Sommarskog in Error and Transaction Handling in SQL Server.

Add some basic checks for things like object existence, the presence of a primary key, and so on.

STRING_AGG should normally have a WITHIN GROUP clause to provide deterministic ordering.

Compute the distinct column lists once, then reuse this value as necessary.

Separate the code into more steps to make it clearer, and easier to debug. Try to avoid code that scrolls off to the right, and deeply nested function calls.

It is difficult to anticipate all the conditions that could cause the MERGE statement to fail, but consider adding extra checks for e.g. computed columns. Even if this is covered by another condition, being explicit makes the intent clearer.

Avoid hard-coding a numeric type id. Use the TYPE_ID function instead.

Using CONCAT_WS can result in cleaner code than CONCAT when using separators.

Also, I am not sure how to make this work with compound keys...

This can be done by consulting sys.key_constraints and sys.index_columns, see the example code below.

Example

This illustrates how the above points might be incorporated:

CREATE OR ALTER PROCEDURE dbo.MergeTables 
 @Source nvarchar(517), 
 @Target nvarchar(517),
 @Debug bit = 0
AS
BEGIN
 SET NOCOUNT, XACT_ABORT ON;
 BEGIN TRY
 -- Cannot specify a database or server in the input table names
 IF PARSENAME(@Source, 3) IS NOT NULL
 BEGIN
 RAISERROR(N'Three- or four-part names are not supported for the table identifer %s.', 16, 1, @Source);
 END;
 
 IF PARSENAME(@Target, 3) IS NOT NULL
 BEGIN
 RAISERROR(N'Three- or four-part names are not supported for the table identifer %s.', 16, 1, @Target);
 END;
 -- Break and quote input e.g. 'My Schema.My Object' - > [My Schema].[My Object]
 SET @Source = CONCAT_WS(N'.',
 QUOTENAME(PARSENAME(@Source, 2)),
 QUOTENAME(PARSENAME(@Source, 1)));
 SET @Target = CONCAT_WS(N'.',
 QUOTENAME(PARSENAME(@Target, 2)),
 QUOTENAME(PARSENAME(@Target, 1)));
 -- Check tables are accessible
 DECLARE @SourceID integer = OBJECT_ID(@Source, N'U');
 DECLARE @TargetID integer = OBJECT_ID(@Target, N'U');
 IF @SourceID IS NULL
 BEGIN
 RAISERROR(N'Table %s not found in the current database.', 16, 1, @Source);
 END;
 IF @TargetID IS NULL 
 BEGIN
 RAISERROR(N'Table %s not found in the current database.', 16, 1, @Target);
 END;
 IF CONVERT(integer, OBJECTPROPERTYEX(@SourceID, 'TableHasPrimaryKey')) = 0
 BEGIN
 RAISERROR(N'Table %s does not have a primary key.', 16, 1, @Source);
 END;
 IF CONVERT(integer, OBJECTPROPERTYEX(@TargetID, 'TableHasPrimaryKey')) = 0
 BEGIN
 RAISERROR(N'Table %s does not have a primary key.', 16, 1, @Target);
 END;
 /* TODO: Check source and target schemas are compatible*/
 DECLARE
 -- Find the primary key columns for the MERGE statement's ON clause
 @PrimaryKeyColumnList nvarchar(max) =
 (
 SELECT
 STRING_AGG(
 CONVERT(nvarchar(max), 
 CONCAT(N'S.', QUOTENAME(C.[name]), N'=T.', QUOTENAME(C.[name]))),
 N' AND ') 
 WITHIN GROUP (ORDER BY IC.key_ordinal)
 FROM sys.key_constraints AS KC
 JOIN sys.index_columns AS IC
 ON IC.[object_id] = KC.parent_object_id
 AND IC.index_id = KC.unique_index_id
 JOIN sys.columns AS C
 ON C.[object_id] = IC.[object_id]
 AND C.column_id = IC.column_id
 WHERE 
 KC.parent_object_id = @SourceID
 AND KC.[type_desc] = N'PRIMARY_KEY_CONSTRAINT'
 ),
 @InsertClause nvarchar(max) = N'INSERT (column_list) VALUES (column_list)',
 -- Find the insert columns list
 @InsertColumnList nvarchar(max) =
 (
 SELECT 
 STRING_AGG(
 CONVERT(nvarchar(max), 
 QUOTENAME(C.[name])), 
 N',')
 WITHIN GROUP (ORDER BY C.column_id)
 FROM sys.columns AS C
 WHERE
 C.[object_id] = @SourceID
 AND C.is_computed = 0
 AND C.is_identity = 0
 AND C.is_column_set = 0
 AND C.is_hidden = 0
 AND C.generated_always_type = 0
 AND C.system_type_id != TYPE_ID(N'timestamp')
 ),
 -- Generate the UPDATE SET clause
 @UpdateClause nvarchar(max) = N'UPDATE SET ' +
 (
 SELECT 
 STRING_AGG(
 CONVERT(nvarchar(max), 
 CONCAT(N'T.', QUOTENAME(C.[name]), N'=S.', QUOTENAME(C.[name]))),
 N',')
 WITHIN GROUP (ORDER BY C.column_id)
 FROM sys.columns AS C
 WHERE 
 C.[object_id] = @SourceID
 AND C.is_computed = 0
 AND C.is_identity = 0
 AND C.is_column_set = 0
 AND C.is_hidden = 0
 AND C.generated_always_type = 0
 AND C.system_type_id != TYPE_ID(N'timestamp')
 AND NOT EXISTS
 (
 -- Exclude column if it is part of the primary key
 SELECT 1
 FROM sys.key_constraints AS KC
 JOIN sys.index_columns AS IC
 ON IC.[object_id] = KC.parent_object_id
 AND IC.index_id = KC.unique_index_id
 WHERE
 KC.parent_object_id = C.[object_id]
 AND KC.[type_desc] = N'PRIMARY_KEY_CONSTRAINT'
 AND IC.column_id = C.column_id
 )
 ),
 -- Basic form of the merge statement
 @MergeStatement nvarchar(max) = 
 CONCAT_WS(
 N' ',
 N'MERGE @Target AS T',
 N'USING @Source AS S',
 N'ON primary_key_match_list',
 N'WHEN MATCHED THEN merge_matched',
 N'WHEN NOT MATCHED BY TARGET THEN merge_not_matched',
 N'WHEN NOT MATCHED BY SOURCE THEN DELETE;');
 -- Substitute the insert column list into the INSERT clause
 SET @InsertClause = REPLACE(@InsertClause, N'column_list', @InsertColumnList);
 -- Substitute other values into the MERGE template
 SET @MergeStatement = REPLACE(@MergeStatement, N'@Target', @Target);
 SET @MergeStatement = REPLACE(@MergeStatement, N'@Source', @Source);
 SET @MergeStatement = REPLACE(@MergeStatement, N'primary_key_match_list', @PrimaryKeyColumnList);
 SET @MergeStatement = REPLACE(@MergeStatement, N'merge_matched', @UpdateClause);
 SET @MergeStatement = REPLACE(@MergeStatement, N'merge_not_matched', @InsertClause);
 BEGIN TRANSACTION;
 IF @Debug = 0
 BEGIN
 -- Peform the MERGE
 EXECUTE (@MergeStatement);
 END;
 ELSE
 BEGIN
 PRINT @MergeStatement;
 END;
 COMMIT TRANSACTION;
 RETURN 0;
 END TRY
 BEGIN CATCH
 IF @@TRANCOUNT > 0 ROLLBACK TRANSACTION;
 THROW;
 RETURN -1;
 END CATCH;
END;

Question 4

A table has no order. The sort will be non-deterministic.

Three things must line up:

Same number of columns
Same (or compatible) type for the column to column
Proper order of the columns

Seems like a lot could go wrong. Even if one table is a clone of the other you still need a sort.

Not getting the ' WHEN NOT MATCHED BY SOURCE THEN DELETE;';

Question 5

Isn't this what the ON clause of the merge statement protects against? docs.microsoft.com/en-us/sql/t-sql/statements/…. The When not matched bit says when the row is not in the source, but is in the destination (i.e. no matches for those rows) delete those rows from the destination

Question 6

Its taken as a precondition that the column metadata matches, otherwise you are right, there is no way to do it

user70021user70021 · Accepted Answer · 2018-08-08 15:10:41Z