4
\$\begingroup\$

I have a requirement for archiving 5 huge tables from PROD to ARCHIVE server without losing the integrity of the tables.

The query makes use of the Linked Server functionality and current idea is to host it on the PROD server. It is a distributed transaction which makes use of Microsoft's Distributed Transaction Coordinator service.

The key requirement is that the PROD server will be live all the time and the performance of the server should not be affected by this procedure being executed.

Tables are as follows:

  • header0 - Header
  • detail0 - Detail
  • email0 - Emails
  • overboard0 - Overboard
  • references0 - References

Only including the header and detail table in the code for better understanding. Other 3 tables refer the same header tables

1 header may have around 2-4 details detail, 8 messages records and may or may not have records in other tables.

The tables don't have a Primary and Foreign Key Relation in DB as such but have columns that are dependent on header table.

Following is the procedure code currently implemented which is working fine for 18k records and processing 180k+ records in 3.5 minutes of time. Please suggest if this can be improved because going further this table may have 2.5 - 3 million records in a single day.

ALTER PROCEDURE [dbo].[usp_Compliane_Archive_And_Delete] 
@client nvarchar(8), -- CLIENT field value
@verbose bit, -- Set to 0/1 for BASIC/VERBOSE logging
@chunkSize int, -- Change as per requirement
@historyDays int -- History Days for deletion
AS
BEGIN
 SET XACT_ABORT ON
 -- DECLARE A TABLE TO HOLD THE KEY VALUES FOR EVERY CHUNK
 DECLARE @NextIDs TABLE(
 cClient nvarchar(8),
 iRunNo int,
 UNIQUE NONCLUSTERED (cClient, iRunNo)
 );
 -- Hold history days value 
 DECLARE @Xdaysago datetime
 SELECT @Xdaysago = DATEADD(DAY, -(@historyDays), GETDATE())
 DECLARE @chunkCount int
 SET @chunkCount = 0
 DECLARE @procRunDate datetime
 SET @procRunDate = GETDATE()
 DECLARE @MainEventID bigint
 DECLARE @ChunkEventID bigint
 DECLARE @IntermediateEventID bigint
 DECLARE @startTime datetime
 DECLARE @endTime datetime
 DECLARE @rowCount int
 DECLARE @procStartTime datetime
 SET @procStartTime = GETDATE()
 DECLARE @TotalRowInserted bigint
 SET @TotalRowInserted = 0
 DECLARE @TotalRowDeleted bigint
 SET @TotalRowDeleted = 0
 -- LOGGING MAIN PROCEDURE EVENT
 INSERT INTO [PS_902mssqldev_prod].[dbo].[ArchiveLog]
 ([RunDateTime], [Event], [EventDescription], [DBName], [ChunkNum], [StartTime], [EndTime], [TotalTime], [RowCount])
 VALUES
 (@procRunDate, 'START', 'COMPLIANCE_PROCEDURE','ARCHIVE', NULL, @procStartTime, NULL, NULL, NULL)
 SET @MainEventID = @@IDENTITY
 WHILE EXISTS(SELECT TOP 1 [ckhrunn]
 FROM [PS_902mssqldev_prod].[dbo].[header0]
 WHERE [client] = @client
 AND [ckhrdte] < @Xdaysago)
 BEGIN
 BEGIN TRY
 BEGIN DISTRIBUTED TRANSACTION
 -- GET ALL THE RECORDS TO BE ARCHIVED
 INSERT INTO @NextIDs (cClient, iRunNo)
 SELECT TOP (@chunkSize) [client], [ckhrunn] FROM [PS_902mssqldev_prod].[dbo].[header0] WHERE [PS_902mssqldev_prod].[dbo].[header0].[ckhrdte] < @Xdaysago
 print 'message '
 -- INCREMENT CHUNK COUNT
 SET @chunkCount = @chunkCount + 1
 -- LOGGING AT CHUNK LEVEL
 INSERT INTO [PS_902mssqldev_prod].[dbo].[ArchiveLog]
 ([RunDateTime], [Event], [EventDescription], [DBName], [ChunkNum], [StartTime], [EndTime], [TotalTime], [RowCount])
 VALUES
 (@procRunDate, 'START', 'CHUNK','ARCHIVE', @chunkCount, GETDATE(), NULL, NULL, NULL)
 SET @ChunkEventID = @@IDENTITY
 -- ################################## ARCHIVING STARTS HERE ######################################### --
 SET @IntermediateEventID = NULL
 SET @startTime = NULL
 SET @endTime = NULL
 SET @rowCount = NULL
 -- ##### COMPLIANCE HEADER TABLE PROCESSING #####
 -- LOGGING AT TRANSACTION LEVEL (THIS WILL BE OPTIONAL IF SET TO BASIC)
 IF @verbose = 1
 BEGIN
 SET @startTime = GETDATE()
 INSERT INTO [PS_902mssqldev_prod].[dbo].[ArchiveLog]
 ([RunDateTime], [Event], [EventDescription], [DBName], [ChunkNum], [StartTime], [EndTime], [TotalTime], [RowCount])
 VALUES
 (@procRunDate, 'INSERT', 'COMPLIANCE HEADER','ARCHIVE', @chunkCount, GETDATE(), NULL, NULL, NULL)
 SET @IntermediateEventID = @@IDENTITY
 END
 -- INSERT INTO HEADER table OF ARCHIVE SERVER's DB
 INSERT INTO [ARCHIVE].[PS_902mssqldev].[dbo].[header0]
 ([client] ,[ckhrunn] ,[ckhsrc] ,[ckhtype] ,[ckhstat] ,[ckhrclnt] ,[ckhrdte] ,[ckhrtme] ,[ckhrusr] ,[ckhcver] ,[ckhprun] ,[ckhcdt1] ,[ckhusr1] ,[ckhusr2] ,[ckhusr3] ,[ckhusr4] ,[ckhactv] ,[ckhuser] ,[ckhdate] ,[ckhtime] ,[ckhwsid] ,[ckhupid] ,[ckhpsl01] ,[ckhpsl02] ,[ckhpsl03] ,[ckhpsl04] ,[ckhpsl05] ,[ckhpslvnum] ,[ckhpslvdte])
 SELECT header.client, header.ckhrunn, header.ckhsrc, header.ckhtype, header.ckhstat, header.ckhrclnt, header.ckhrdte, header.ckhrtme, header.ckhrusr, header.ckhcver, header.ckhprun, header.ckhcdt1, header.ckhusr1, header.ckhusr2, header.ckhusr3, header.ckhusr4, header.ckhactv, header.ckhuser, header.ckhdate, header.ckhtime, header.ckhwsid, header.ckhupid, header.ckhpsl01, header.ckhpsl02, header.ckhpsl03, header.ckhpsl04, header.ckhpsl05, header.ckhpslvnum, header.ckhpslvdte
 FROM [PS_902mssqldev_prod].[dbo].[header0] AS header
 INNER JOIN @NextIDs AS IDs
 ON header.client = IDs.cClient
 AND header.ckhrunn = IDs.iRunNo
 WHERE
 NOT EXISTS (SELECT TOP 1 [client], [ckhrunn] FROM [ARCHIVE].[PS_902mssqldev].[dbo].[header0] AS archiveHeader
 WHERE archiveHeader.client = IDs.cClient
 AND archiveHeader.ckhrunn = IDs.iRunNo)
 SET @rowCount = @@ROWCOUNT
 SET @TotalRowInserted = @TotalRowInserted + @rowCount
 IF @verbose = 1
 BEGIN
 SET @endTime = GETDATE()
 UPDATE [PS_902mssqldev_prod].[dbo].[ArchiveLog] 
 SET
 [EndTime] = @endTime,
 [TotalTime] = DATEDIFF(ss, @startTime, @endTime),
 [RowCount] = @rowCount
 FROM [PS_902mssqldev_prod].[dbo].[ArchiveLog] archLog
 WHERE archLog.EventID = @IntermediateEventID
 END
 -- ##### COMPLIANCE DETAIL TABLE PROCESSING #####
 -- LOGGING AT TRANSACTION LEVEL (THIS WILL BE OPTIONAL IF SET TO BASIC)
 IF @verbose = 1
 BEGIN
 SET @startTime = GETDATE()
 INSERT INTO [PS_902mssqldev_prod].[dbo].[ArchiveLog]
 ([RunDateTime], [Event], [EventDescription], [DBName], [ChunkNum], [StartTime], [EndTime], [TotalTime], [RowCount])
 VALUES
 (@procRunDate, 'INSERT', 'COMPLIANCE DETAIL','ARCHIVE', @chunkCount, GETDATE(), NULL, NULL, NULL)
 SET @IntermediateEventID = @@IDENTITY
 END
 -- INSERT INTO DETAIL table OF ARCHIVE SERVER's DB
 INSERT INTO [ARCHIVE].[PS_902mssqldev].[dbo].[detail0]
 ([client] ,[ckdrunn] ,[ckdcat] ,[ckdkey1] ,[ckdkey2] ,[ckdkey3] ,[ckdkey4] ,[ckdseqn] ,[ckdstat] ,[ckdosts] ,[ckdoemp] ,[ckdodte] ,[ckdotme] ,[ckdixml] ,[ckdoxml] ,[ckdrmrk] ,[ckdrusr] ,[ckdrdte] ,[ckdrtme] ,[ckdvid1] ,[ckdvid2] ,[ckdcver] ,[ckdrunb] ,[ckdxdtl1] ,[ckdxdtl2] ,[ckdusr1] ,[ckdusr2] ,[ckdusr3] ,[ckdusr4] ,[ckdactv] ,[ckduser] ,[ckddate] ,[ckdtime] ,[ckdwsid] ,[ckdupid] ,[ckdpsl01] ,[ckdpsl02] ,[ckdpsl03] ,[ckdpsl04] ,[ckdpsl05] ,[ckdpslvnum] ,[ckdpslvdte])
 SELECT detail.client, detail.ckdrunn, detail.ckdcat, detail.ckdkey1, detail.ckdkey2, detail.ckdkey3, detail.ckdkey4, detail.ckdseqn, detail.ckdstat, detail.ckdosts, detail.ckdoemp, detail.ckdodte, detail.ckdotme, detail.ckdixml, detail.ckdoxml, detail.ckdrmrk, detail.ckdrusr, detail.ckdrdte, detail.ckdrtme, detail.ckdvid1, detail.ckdvid2, detail.ckdcver, detail.ckdrunb, detail.ckdxdtl1, detail.ckdxdtl2, detail.ckdusr1, detail.ckdusr2, detail.ckdusr3, detail.ckdusr4, detail.ckdactv, detail.ckduser, detail.ckddate, detail.ckdtime, detail.ckdwsid, detail.ckdupid, detail.ckdpsl01, detail.ckdpsl02, detail.ckdpsl03, detail.ckdpsl04, detail.ckdpsl05, detail.ckdpslvnum, detail.ckdpslvdte
 FROM [PS_902mssqldev_prod].[dbo].[detail0] AS detail
 INNER JOIN @NextIDs AS IDs
 ON detail.client = IDs.cClient
 AND detail.ckdrunn = IDs.iRunNo
 WHERE
 NOT EXISTS (SELECT TOP 1 [client], [ckdrunn] FROM [ARCHIVE].[PS_902mssqldev].[dbo].[detail0] AS archiveDetail
 WHERE archiveDetail.client = IDs.cClient
 AND archiveDetail.ckdrunn = IDs.iRunNo)
 SET @rowCount = @@ROWCOUNT
 SET @TotalRowInserted = @TotalRowInserted + @rowCount
 -- UPDATE THE LOG FOR ANALYSIS
 IF @verbose = 1
 BEGIN
 SET @endTime = GETDATE()
 UPDATE [PS_902mssqldev_prod].[dbo].[ArchiveLog] 
 SET
 [EndTime] = @endTime,
 [TotalTime] = DATEDIFF(ss, @startTime, @endTime),
 [RowCount] = @rowCount
 FROM [PS_902mssqldev_prod].[dbo].[ArchiveLog] archLog
 WHERE archLog.EventID = @IntermediateEventID
 END
 -- ################################## DELETION STARTS HERE ######################################### --
 -- ##### COMPLIANCE HEADER DELETE FROM PROD START #####
 -- LOGGING AT TRANSACTION LEVEL (THIS WILL BE OPTIONAL IF SET TO BASIC)
 IF @verbose = 1
 BEGIN
 SET @startTime = GETDATE()
 INSERT INTO [PS_902mssqldev_prod].[dbo].[ArchiveLog]
 ([RunDateTime], [Event], [EventDescription], [DBName], [ChunkNum], [StartTime], [EndTime], [TotalTime], [RowCount])
 VALUES
 (@procRunDate, 'DELETE', 'COMPLIANCE HEADER','PROD', @chunkCount, GETDATE(), NULL, NULL, NULL)
 SET @IntermediateEventID = @@IDENTITY
 END
 -- DELETE HEADER RECORDS FROM PROD SERVER's DB
 DELETE header 
 FROM [PS_902mssqldev_prod].[dbo].[header0] header
 INNER JOIN @NextIDs IDs
 ON header.client = IDs.cClient
 AND header.ckhrunn = IDs.iRunNo
 SET @rowCount = @@ROWCOUNT
 SET @TotalRowDeleted = @TotalRowDeleted + @rowCount
 -- UPDATE THE LOG FOR ANALYSIS
 IF @verbose = 1
 BEGIN
 SET @endTime = GETDATE()
 UPDATE [PS_902mssqldev_prod].[dbo].[ArchiveLog] 
 SET
 [EndTime] = @endTime,
 [TotalTime] = DATEDIFF(ss, @startTime, @endTime),
 [RowCount] = @rowCount
 FROM [PS_902mssqldev_prod].[dbo].[ArchiveLog] archLog
 WHERE archLog.EventID = @IntermediateEventID
 END
 -- ##### COMPLIANCE DETAIL DELETE FROM PROD START #####
 -- LOGGING AT TRANSACTION LEVEL (THIS WILL BE OPTIONAL IF SET TO BASIC)
 IF @verbose = 1
 BEGIN
 SET @startTime = GETDATE()
 INSERT INTO [PS_902mssqldev_prod].[dbo].[ArchiveLog]
 ([RunDateTime], [Event], [EventDescription], [DBName], [ChunkNum], [StartTime], [EndTime], [TotalTime], [RowCount])
 VALUES
 (@procRunDate, 'DELETE', 'COMPLIANCE DETAIL','PROD', @chunkCount, GETDATE(), NULL, NULL, NULL)
 SET @IntermediateEventID = @@IDENTITY
 END
 -- DELETE DETAIL RECORDS FROM PROD SERVER's DB
 DELETE detail
 FROM [PS_902mssqldev_prod].[dbo].[detail0] detail
 INNER JOIN @NextIDs IDs
 ON detail.client = IDs.cClient
 AND detail.ckdrunn = IDs.iRunNo
 SET @rowCount = @@ROWCOUNT
 SET @TotalRowDeleted = @TotalRowDeleted + @rowCount 
 -- UPDATE THE LOG FOR ANALYSIS
 IF @verbose = 1
 BEGIN
 SET @endTime = GETDATE()
 UPDATE [PS_902mssqldev_prod].[dbo].[ArchiveLog] 
 SET
 [EndTime] = @endTime,
 [TotalTime] = DATEDIFF(ss, @startTime, @endTime),
 [RowCount] = @rowCount
 FROM [PS_902mssqldev_prod].[dbo].[ArchiveLog] archLog
 WHERE archLog.EventID = @IntermediateEventID
 END
 -- UPDATE THE LOGGING FOR MAIN PROCEDURE EVENT
 SET @endTime = GETDATE()
 UPDATE [PS_902mssqldev_prod].[dbo].[ArchiveLog]
 SET
 [EndTime] = @endTime,
 [TotalTime] = DATEDIFF(ss, @procStartTime, @endTime),
 [RowsCreatedCount] = @TotalRowInserted,
 [RowsDeletedCount] = @TotalRowDeleted
 FROM [PS_902mssqldev_prod].[dbo].[ArchiveLog] archLog
 WHERE archLog.EventID = @MainEventID
 DELETE FROM @NextIDs
 COMMIT
 END TRY
 BEGIN CATCH
 print 'catch' 
 IF @@TRANCOUNT > 0
 ROLLBACK
 SET NOCOUNT ON 
 INSERT INTO [ARCHIVE].[PS_902mssqldev].[dbo].[ErrorLog]
 (ErrorNumber, ErrorDescription, ErrorProcedure, ErrorState, ErrorSeverity, ErrorLine, ErrorTime)
 VALUES
 (ERROR_NUMBER(), ERROR_MESSAGE(), ERROR_PROCEDURE(), ERROR_STATE(), ERROR_SEVERITY(), ERROR_LINE(), GETDATE());
 SET NOCOUNT OFF
 END CATCH;
 END /* IF / WHILE */
END;
GO
--RUN PROCEDURE
--exec usp_Compliane_Archive_And_Delete 'SYS', 1, 5000, 14

This will run in a chunk of 5000 records which can be controlled from the input parameter.

Some logging is also implemented in order to track time taken and other information.

Please suggest if this is a good design as per the requirement.

BCdotWEB
11.4k2 gold badges28 silver badges45 bronze badges
asked Mar 10, 2016 at 10:49
\$\endgroup\$

1 Answer 1

3
\$\begingroup\$

Nitpicks

You use separate statements for DECLARE and SET quite a bit. While there is nothing wrong with that, it's often just easier to read if you combine them:

DECLARE @Xdaysago datetime = DATEADD(DAY, -(@historyDays), GETDATE());
DECLARE @chunkCount int = 0;
DECLARE @procRunDate datetime = GETDATE();

You made this comment in your post:

This will run in a chunk of 5000 records which can be controlled from the input parameter.

If you know that your default value is 5000, consider setting it as default value in your procedue signature (hence making the parameter optional). You could also do this for @verbose if you expect to run it with one of the two settings more often than not. In both cases, the params would only then need to be provided if the caller decided to not use the provided default values.

ALTER PROCEDURE [dbo].[usp_Compliane_Archive_And_Delete] 
 @client nvarchar(8), -- CLIENT field value
 @historyDays int, -- History Days for deletion
 @chunkSize int = 5000, -- Change as per requirement
 @verbose bit = 0 -- Set to 0/1 for BASIC/VERBOSE logging
AS ...

Note I changed the order a bit to make the optional parameters last.


It's a good habit to terminate your statements with semicolon ; even though Microsoft T-SQL is not strict at the moment. It is the ANSI standard and as far as I am aware Microsoft is the only RDBMS that does not enforce this yet.

According to MSDN (emphasis mine):

Transact-SQL statement terminator.Although the semicolon is not required for most statements in this version of SQL Server, it will be required in a future version.

References:


Lines like these should be shortened to make reading easier (and avoid making mistakes like mixing up column counts):

INSERT INTO [ARCHIVE].[PS_902mssqldev].[dbo].[header0]
([client] ,[ckhrunn] ,[ckhsrc] ,[ckhtype] ,[ckhstat] ,[ckhrclnt] ,[ckhrdte] ,[ckhrtme] ,[ckhrusr] ,[ckhcver] ,[ckhprun] ,[ckhcdt1] ,[ckhusr1] ,[ckhusr2] ,[ckhusr3] ,[ckhusr4] ,[ckhactv] ,[ckhuser] ,[ckhdate] ,[ckhtime] ,[ckhwsid] ,[ckhupid] ,[ckhpsl01] ,[ckhpsl02] ,[ckhpsl03] ,[ckhpsl04] ,[ckhpsl05] ,[ckhpslvnum] ,[ckhpslvdte])
SELECT header.client, header.ckhrunn, header.ckhsrc, header.ckhtype, header.ckhstat, header.ckhrclnt, header.ckhrdte, header.ckhrtme, header.ckhrusr, header.ckhcver, header.ckhprun, header.ckhcdt1, header.ckhusr1, header.ckhusr2, header.ckhusr3, header.ckhusr4, header.ckhactv, header.ckhuser, header.ckhdate, header.ckhtime, header.ckhwsid, header.ckhupid, header.ckhpsl01, header.ckhpsl02, header.ckhpsl03, header.ckhpsl04, header.ckhpsl05, header.ckhpslvnum, header.ckhpslvdte
FROM [PS_902mssqldev_prod].[dbo].[header0] AS header

Consider adding line breaks every N columns on both the INSERT INTO and SELECT statements to make it more legible and having to scroll horizontally to see the whole statement.


You can use += and -= operators in SET statements (like increments within loops, etc.) as you can in many other languages:

SET @TotalRowInserted = @TotalRowInserted + @rowCount;
...
SET @chunkCount = @chunkCount + 1;

Become simply:

SET @TotalRowInserted += @rowCount;
...
SET @chunkCount += 1;

Performance

For the most part, this is pretty straightforward and there may not be huge performance gains to be made, but I spotted a few things that potentially could help some.*

* Your mileage may vary.


Transaction isolation level

There is no declared isolation level in your procedure. How could it impact performance? Well, it depends. The database probably has a default isolation level for read operations, you should definitely check that.

You may or may not be worried about uncommitted reads (a.k.a. "ghost reads"), and this would primarily be for the INSERT INTO @NextIDs as far as I can tell, but it could alleviate potential locks on the production database.

Reference: SET TRANSACTION ISOLATION LEVEL - MSDN

Another small improvement might be to SET NOCOUNT ON for the whole procedure, rather than just at the end in your CATCH block.


"Physical" temporary table

This:

DECLARE @NextIDs TABLE(
 cClient nvarchar(8),
 iRunNo int,
 UNIQUE NONCLUSTERED (cClient, iRunNo)
 );

... might be better off as a "physical" #NextIDs temp table on disk, instead of an in-memory table. This of course can vary depending on the memory and I/O capabilities of the server(s), as well as which version of SQL Server you are using and so on. I can just say that from my own experience at least, #TempTable are almost always faster than @TempTable when used for any non-trivial amount of records. This also lets you control indexing and things like that, while the temp table exists.

Some references on the topic:

This one is really good:


Ask a DBA!

Database Administrators have some unique and privileged insights in regards to database performance, specifications and such. In addition to consulting your internal DBA(s), there is also a Database Administrators Stack Exchange site where you could get more advanced advice (although this question as it is would not be good there - read their FAQ What topics can I ask about here? and What types of questions should I avoid asking? if you decide to ask a question there.

answered Mar 16, 2016 at 22:38
\$\endgroup\$
10
  • 1
    \$\begingroup\$ As an extra temporary tables support indexes, we don't have enough info on the table definitions and execution plans to say anything about efficiency but that could be an added bonus, that could be emphasized more \$\endgroup\$ Commented Mar 16, 2016 at 22:46
  • \$\begingroup\$ @PinCrash It was a great and thorough analysis of the procedure and can't thank you enough for all the points you have covered. Just a couple of questions I had in mind: \$\endgroup\$ Commented Mar 18, 2016 at 8:45
  • \$\begingroup\$ @PinCrash It was a great & thorough analysis of the procedure and can't thank you enough for all the points covered. Thanks for pointing out SET NOCOUNT ON My bad. Just a couple of questions I had in mind: 1. What do you mean when you say, * Your mileage may vary? 2. What database should be in use when these #Tables should be created? 3. Any specific suggestion regarding deletion of the records? Also, now while inserting in archive select statements we now use with(nolock) hint. This is because we are dealing with history records and should'nt cause dirty reads. \$\endgroup\$ Commented Mar 18, 2016 at 9:37
  • \$\begingroup\$ (1) What I meant by that is some database setups may benefit from some of those things, while some others may not, depending on a whole range of different factors that make each individual database server different from others \$\endgroup\$ Commented Mar 18, 2016 at 10:53
  • \$\begingroup\$ (2) When I have used #TempTables in queries that involved linked servers, I have created them on the source or "starting" server, but those were very different type of queries (running selects in a cursor over N linked servers to gather some data from each to a central location). In your case, I think it might be better to create them on the archive server as it doesn't seem as "critical" for it to have no performance slowdowns while archiving, although I would suggest to test both. \$\endgroup\$ Commented Mar 18, 2016 at 10:57

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.