I have a requirement for archiving 5 huge tables from PROD to ARCHIVE server without losing the integrity of the tables.
The query makes use of the Linked Server functionality and current idea is to host it on the PROD server. It is a distributed transaction which makes use of Microsoft's Distributed Transaction Coordinator service.
The key requirement is that the PROD server will be live all the time and the performance of the server should not be affected by this procedure being executed.
Tables are as follows:
- header0 - Header
- detail0 - Detail
- email0 - Emails
- overboard0 - Overboard
- references0 - References
Only including the header and detail table in the code for better understanding. Other 3 tables refer the same header tables
1 header may have around 2-4 details detail, 8 messages records and may or may not have records in other tables.
The tables don't have a Primary and Foreign Key Relation in DB as such but have columns that are dependent on header table.
Following is the procedure code currently implemented which is working fine for 18k records and processing 180k+ records in 3.5 minutes of time. Please suggest if this can be improved because going further this table may have 2.5 - 3 million records in a single day.
ALTER PROCEDURE [dbo].[usp_Compliane_Archive_And_Delete]
@client nvarchar(8), -- CLIENT field value
@verbose bit, -- Set to 0/1 for BASIC/VERBOSE logging
@chunkSize int, -- Change as per requirement
@historyDays int -- History Days for deletion
AS
BEGIN
SET XACT_ABORT ON
-- DECLARE A TABLE TO HOLD THE KEY VALUES FOR EVERY CHUNK
DECLARE @NextIDs TABLE(
cClient nvarchar(8),
iRunNo int,
UNIQUE NONCLUSTERED (cClient, iRunNo)
);
-- Hold history days value
DECLARE @Xdaysago datetime
SELECT @Xdaysago = DATEADD(DAY, -(@historyDays), GETDATE())
DECLARE @chunkCount int
SET @chunkCount = 0
DECLARE @procRunDate datetime
SET @procRunDate = GETDATE()
DECLARE @MainEventID bigint
DECLARE @ChunkEventID bigint
DECLARE @IntermediateEventID bigint
DECLARE @startTime datetime
DECLARE @endTime datetime
DECLARE @rowCount int
DECLARE @procStartTime datetime
SET @procStartTime = GETDATE()
DECLARE @TotalRowInserted bigint
SET @TotalRowInserted = 0
DECLARE @TotalRowDeleted bigint
SET @TotalRowDeleted = 0
-- LOGGING MAIN PROCEDURE EVENT
INSERT INTO [PS_902mssqldev_prod].[dbo].[ArchiveLog]
([RunDateTime], [Event], [EventDescription], [DBName], [ChunkNum], [StartTime], [EndTime], [TotalTime], [RowCount])
VALUES
(@procRunDate, 'START', 'COMPLIANCE_PROCEDURE','ARCHIVE', NULL, @procStartTime, NULL, NULL, NULL)
SET @MainEventID = @@IDENTITY
WHILE EXISTS(SELECT TOP 1 [ckhrunn]
FROM [PS_902mssqldev_prod].[dbo].[header0]
WHERE [client] = @client
AND [ckhrdte] < @Xdaysago)
BEGIN
BEGIN TRY
BEGIN DISTRIBUTED TRANSACTION
-- GET ALL THE RECORDS TO BE ARCHIVED
INSERT INTO @NextIDs (cClient, iRunNo)
SELECT TOP (@chunkSize) [client], [ckhrunn] FROM [PS_902mssqldev_prod].[dbo].[header0] WHERE [PS_902mssqldev_prod].[dbo].[header0].[ckhrdte] < @Xdaysago
print 'message '
-- INCREMENT CHUNK COUNT
SET @chunkCount = @chunkCount + 1
-- LOGGING AT CHUNK LEVEL
INSERT INTO [PS_902mssqldev_prod].[dbo].[ArchiveLog]
([RunDateTime], [Event], [EventDescription], [DBName], [ChunkNum], [StartTime], [EndTime], [TotalTime], [RowCount])
VALUES
(@procRunDate, 'START', 'CHUNK','ARCHIVE', @chunkCount, GETDATE(), NULL, NULL, NULL)
SET @ChunkEventID = @@IDENTITY
-- ################################## ARCHIVING STARTS HERE ######################################### --
SET @IntermediateEventID = NULL
SET @startTime = NULL
SET @endTime = NULL
SET @rowCount = NULL
-- ##### COMPLIANCE HEADER TABLE PROCESSING #####
-- LOGGING AT TRANSACTION LEVEL (THIS WILL BE OPTIONAL IF SET TO BASIC)
IF @verbose = 1
BEGIN
SET @startTime = GETDATE()
INSERT INTO [PS_902mssqldev_prod].[dbo].[ArchiveLog]
([RunDateTime], [Event], [EventDescription], [DBName], [ChunkNum], [StartTime], [EndTime], [TotalTime], [RowCount])
VALUES
(@procRunDate, 'INSERT', 'COMPLIANCE HEADER','ARCHIVE', @chunkCount, GETDATE(), NULL, NULL, NULL)
SET @IntermediateEventID = @@IDENTITY
END
-- INSERT INTO HEADER table OF ARCHIVE SERVER's DB
INSERT INTO [ARCHIVE].[PS_902mssqldev].[dbo].[header0]
([client] ,[ckhrunn] ,[ckhsrc] ,[ckhtype] ,[ckhstat] ,[ckhrclnt] ,[ckhrdte] ,[ckhrtme] ,[ckhrusr] ,[ckhcver] ,[ckhprun] ,[ckhcdt1] ,[ckhusr1] ,[ckhusr2] ,[ckhusr3] ,[ckhusr4] ,[ckhactv] ,[ckhuser] ,[ckhdate] ,[ckhtime] ,[ckhwsid] ,[ckhupid] ,[ckhpsl01] ,[ckhpsl02] ,[ckhpsl03] ,[ckhpsl04] ,[ckhpsl05] ,[ckhpslvnum] ,[ckhpslvdte])
SELECT header.client, header.ckhrunn, header.ckhsrc, header.ckhtype, header.ckhstat, header.ckhrclnt, header.ckhrdte, header.ckhrtme, header.ckhrusr, header.ckhcver, header.ckhprun, header.ckhcdt1, header.ckhusr1, header.ckhusr2, header.ckhusr3, header.ckhusr4, header.ckhactv, header.ckhuser, header.ckhdate, header.ckhtime, header.ckhwsid, header.ckhupid, header.ckhpsl01, header.ckhpsl02, header.ckhpsl03, header.ckhpsl04, header.ckhpsl05, header.ckhpslvnum, header.ckhpslvdte
FROM [PS_902mssqldev_prod].[dbo].[header0] AS header
INNER JOIN @NextIDs AS IDs
ON header.client = IDs.cClient
AND header.ckhrunn = IDs.iRunNo
WHERE
NOT EXISTS (SELECT TOP 1 [client], [ckhrunn] FROM [ARCHIVE].[PS_902mssqldev].[dbo].[header0] AS archiveHeader
WHERE archiveHeader.client = IDs.cClient
AND archiveHeader.ckhrunn = IDs.iRunNo)
SET @rowCount = @@ROWCOUNT
SET @TotalRowInserted = @TotalRowInserted + @rowCount
IF @verbose = 1
BEGIN
SET @endTime = GETDATE()
UPDATE [PS_902mssqldev_prod].[dbo].[ArchiveLog]
SET
[EndTime] = @endTime,
[TotalTime] = DATEDIFF(ss, @startTime, @endTime),
[RowCount] = @rowCount
FROM [PS_902mssqldev_prod].[dbo].[ArchiveLog] archLog
WHERE archLog.EventID = @IntermediateEventID
END
-- ##### COMPLIANCE DETAIL TABLE PROCESSING #####
-- LOGGING AT TRANSACTION LEVEL (THIS WILL BE OPTIONAL IF SET TO BASIC)
IF @verbose = 1
BEGIN
SET @startTime = GETDATE()
INSERT INTO [PS_902mssqldev_prod].[dbo].[ArchiveLog]
([RunDateTime], [Event], [EventDescription], [DBName], [ChunkNum], [StartTime], [EndTime], [TotalTime], [RowCount])
VALUES
(@procRunDate, 'INSERT', 'COMPLIANCE DETAIL','ARCHIVE', @chunkCount, GETDATE(), NULL, NULL, NULL)
SET @IntermediateEventID = @@IDENTITY
END
-- INSERT INTO DETAIL table OF ARCHIVE SERVER's DB
INSERT INTO [ARCHIVE].[PS_902mssqldev].[dbo].[detail0]
([client] ,[ckdrunn] ,[ckdcat] ,[ckdkey1] ,[ckdkey2] ,[ckdkey3] ,[ckdkey4] ,[ckdseqn] ,[ckdstat] ,[ckdosts] ,[ckdoemp] ,[ckdodte] ,[ckdotme] ,[ckdixml] ,[ckdoxml] ,[ckdrmrk] ,[ckdrusr] ,[ckdrdte] ,[ckdrtme] ,[ckdvid1] ,[ckdvid2] ,[ckdcver] ,[ckdrunb] ,[ckdxdtl1] ,[ckdxdtl2] ,[ckdusr1] ,[ckdusr2] ,[ckdusr3] ,[ckdusr4] ,[ckdactv] ,[ckduser] ,[ckddate] ,[ckdtime] ,[ckdwsid] ,[ckdupid] ,[ckdpsl01] ,[ckdpsl02] ,[ckdpsl03] ,[ckdpsl04] ,[ckdpsl05] ,[ckdpslvnum] ,[ckdpslvdte])
SELECT detail.client, detail.ckdrunn, detail.ckdcat, detail.ckdkey1, detail.ckdkey2, detail.ckdkey3, detail.ckdkey4, detail.ckdseqn, detail.ckdstat, detail.ckdosts, detail.ckdoemp, detail.ckdodte, detail.ckdotme, detail.ckdixml, detail.ckdoxml, detail.ckdrmrk, detail.ckdrusr, detail.ckdrdte, detail.ckdrtme, detail.ckdvid1, detail.ckdvid2, detail.ckdcver, detail.ckdrunb, detail.ckdxdtl1, detail.ckdxdtl2, detail.ckdusr1, detail.ckdusr2, detail.ckdusr3, detail.ckdusr4, detail.ckdactv, detail.ckduser, detail.ckddate, detail.ckdtime, detail.ckdwsid, detail.ckdupid, detail.ckdpsl01, detail.ckdpsl02, detail.ckdpsl03, detail.ckdpsl04, detail.ckdpsl05, detail.ckdpslvnum, detail.ckdpslvdte
FROM [PS_902mssqldev_prod].[dbo].[detail0] AS detail
INNER JOIN @NextIDs AS IDs
ON detail.client = IDs.cClient
AND detail.ckdrunn = IDs.iRunNo
WHERE
NOT EXISTS (SELECT TOP 1 [client], [ckdrunn] FROM [ARCHIVE].[PS_902mssqldev].[dbo].[detail0] AS archiveDetail
WHERE archiveDetail.client = IDs.cClient
AND archiveDetail.ckdrunn = IDs.iRunNo)
SET @rowCount = @@ROWCOUNT
SET @TotalRowInserted = @TotalRowInserted + @rowCount
-- UPDATE THE LOG FOR ANALYSIS
IF @verbose = 1
BEGIN
SET @endTime = GETDATE()
UPDATE [PS_902mssqldev_prod].[dbo].[ArchiveLog]
SET
[EndTime] = @endTime,
[TotalTime] = DATEDIFF(ss, @startTime, @endTime),
[RowCount] = @rowCount
FROM [PS_902mssqldev_prod].[dbo].[ArchiveLog] archLog
WHERE archLog.EventID = @IntermediateEventID
END
-- ################################## DELETION STARTS HERE ######################################### --
-- ##### COMPLIANCE HEADER DELETE FROM PROD START #####
-- LOGGING AT TRANSACTION LEVEL (THIS WILL BE OPTIONAL IF SET TO BASIC)
IF @verbose = 1
BEGIN
SET @startTime = GETDATE()
INSERT INTO [PS_902mssqldev_prod].[dbo].[ArchiveLog]
([RunDateTime], [Event], [EventDescription], [DBName], [ChunkNum], [StartTime], [EndTime], [TotalTime], [RowCount])
VALUES
(@procRunDate, 'DELETE', 'COMPLIANCE HEADER','PROD', @chunkCount, GETDATE(), NULL, NULL, NULL)
SET @IntermediateEventID = @@IDENTITY
END
-- DELETE HEADER RECORDS FROM PROD SERVER's DB
DELETE header
FROM [PS_902mssqldev_prod].[dbo].[header0] header
INNER JOIN @NextIDs IDs
ON header.client = IDs.cClient
AND header.ckhrunn = IDs.iRunNo
SET @rowCount = @@ROWCOUNT
SET @TotalRowDeleted = @TotalRowDeleted + @rowCount
-- UPDATE THE LOG FOR ANALYSIS
IF @verbose = 1
BEGIN
SET @endTime = GETDATE()
UPDATE [PS_902mssqldev_prod].[dbo].[ArchiveLog]
SET
[EndTime] = @endTime,
[TotalTime] = DATEDIFF(ss, @startTime, @endTime),
[RowCount] = @rowCount
FROM [PS_902mssqldev_prod].[dbo].[ArchiveLog] archLog
WHERE archLog.EventID = @IntermediateEventID
END
-- ##### COMPLIANCE DETAIL DELETE FROM PROD START #####
-- LOGGING AT TRANSACTION LEVEL (THIS WILL BE OPTIONAL IF SET TO BASIC)
IF @verbose = 1
BEGIN
SET @startTime = GETDATE()
INSERT INTO [PS_902mssqldev_prod].[dbo].[ArchiveLog]
([RunDateTime], [Event], [EventDescription], [DBName], [ChunkNum], [StartTime], [EndTime], [TotalTime], [RowCount])
VALUES
(@procRunDate, 'DELETE', 'COMPLIANCE DETAIL','PROD', @chunkCount, GETDATE(), NULL, NULL, NULL)
SET @IntermediateEventID = @@IDENTITY
END
-- DELETE DETAIL RECORDS FROM PROD SERVER's DB
DELETE detail
FROM [PS_902mssqldev_prod].[dbo].[detail0] detail
INNER JOIN @NextIDs IDs
ON detail.client = IDs.cClient
AND detail.ckdrunn = IDs.iRunNo
SET @rowCount = @@ROWCOUNT
SET @TotalRowDeleted = @TotalRowDeleted + @rowCount
-- UPDATE THE LOG FOR ANALYSIS
IF @verbose = 1
BEGIN
SET @endTime = GETDATE()
UPDATE [PS_902mssqldev_prod].[dbo].[ArchiveLog]
SET
[EndTime] = @endTime,
[TotalTime] = DATEDIFF(ss, @startTime, @endTime),
[RowCount] = @rowCount
FROM [PS_902mssqldev_prod].[dbo].[ArchiveLog] archLog
WHERE archLog.EventID = @IntermediateEventID
END
-- UPDATE THE LOGGING FOR MAIN PROCEDURE EVENT
SET @endTime = GETDATE()
UPDATE [PS_902mssqldev_prod].[dbo].[ArchiveLog]
SET
[EndTime] = @endTime,
[TotalTime] = DATEDIFF(ss, @procStartTime, @endTime),
[RowsCreatedCount] = @TotalRowInserted,
[RowsDeletedCount] = @TotalRowDeleted
FROM [PS_902mssqldev_prod].[dbo].[ArchiveLog] archLog
WHERE archLog.EventID = @MainEventID
DELETE FROM @NextIDs
COMMIT
END TRY
BEGIN CATCH
print 'catch'
IF @@TRANCOUNT > 0
ROLLBACK
SET NOCOUNT ON
INSERT INTO [ARCHIVE].[PS_902mssqldev].[dbo].[ErrorLog]
(ErrorNumber, ErrorDescription, ErrorProcedure, ErrorState, ErrorSeverity, ErrorLine, ErrorTime)
VALUES
(ERROR_NUMBER(), ERROR_MESSAGE(), ERROR_PROCEDURE(), ERROR_STATE(), ERROR_SEVERITY(), ERROR_LINE(), GETDATE());
SET NOCOUNT OFF
END CATCH;
END /* IF / WHILE */
END;
GO
--RUN PROCEDURE
--exec usp_Compliane_Archive_And_Delete 'SYS', 1, 5000, 14
This will run in a chunk of 5000 records which can be controlled from the input parameter.
Some logging is also implemented in order to track time taken and other information.
Please suggest if this is a good design as per the requirement.
1 Answer 1
Nitpicks
You use separate statements for DECLARE
and SET
quite a bit. While there is nothing wrong with that, it's often just easier to read if you combine them:
DECLARE @Xdaysago datetime = DATEADD(DAY, -(@historyDays), GETDATE());
DECLARE @chunkCount int = 0;
DECLARE @procRunDate datetime = GETDATE();
You made this comment in your post:
This will run in a chunk of 5000 records which can be controlled from the input parameter.
If you know that your default value is 5000, consider setting it as default value in your procedue signature (hence making the parameter optional). You could also do this for @verbose
if you expect to run it with one of the two settings more often than not. In both cases, the params would only then need to be provided if the caller decided to not use the provided default values.
ALTER PROCEDURE [dbo].[usp_Compliane_Archive_And_Delete]
@client nvarchar(8), -- CLIENT field value
@historyDays int, -- History Days for deletion
@chunkSize int = 5000, -- Change as per requirement
@verbose bit = 0 -- Set to 0/1 for BASIC/VERBOSE logging
AS ...
Note I changed the order a bit to make the optional parameters last.
It's a good habit to terminate your statements with semicolon ;
even though Microsoft T-SQL is not strict at the moment. It is the ANSI standard and as far as I am aware Microsoft is the only RDBMS that does not enforce this yet.
According to MSDN (emphasis mine):
Transact-SQL statement terminator.Although the semicolon is not required for most statements in this version of SQL Server, it will be required in a future version.
References:
Always Use Semicolon Statement Terminators - Dan Guzman's Blog
Lines like these should be shortened to make reading easier (and avoid making mistakes like mixing up column counts):
INSERT INTO [ARCHIVE].[PS_902mssqldev].[dbo].[header0] ([client] ,[ckhrunn] ,[ckhsrc] ,[ckhtype] ,[ckhstat] ,[ckhrclnt] ,[ckhrdte] ,[ckhrtme] ,[ckhrusr] ,[ckhcver] ,[ckhprun] ,[ckhcdt1] ,[ckhusr1] ,[ckhusr2] ,[ckhusr3] ,[ckhusr4] ,[ckhactv] ,[ckhuser] ,[ckhdate] ,[ckhtime] ,[ckhwsid] ,[ckhupid] ,[ckhpsl01] ,[ckhpsl02] ,[ckhpsl03] ,[ckhpsl04] ,[ckhpsl05] ,[ckhpslvnum] ,[ckhpslvdte]) SELECT header.client, header.ckhrunn, header.ckhsrc, header.ckhtype, header.ckhstat, header.ckhrclnt, header.ckhrdte, header.ckhrtme, header.ckhrusr, header.ckhcver, header.ckhprun, header.ckhcdt1, header.ckhusr1, header.ckhusr2, header.ckhusr3, header.ckhusr4, header.ckhactv, header.ckhuser, header.ckhdate, header.ckhtime, header.ckhwsid, header.ckhupid, header.ckhpsl01, header.ckhpsl02, header.ckhpsl03, header.ckhpsl04, header.ckhpsl05, header.ckhpslvnum, header.ckhpslvdte FROM [PS_902mssqldev_prod].[dbo].[header0] AS header
Consider adding line breaks every N columns on both the INSERT INTO
and SELECT
statements to make it more legible and having to scroll horizontally to see the whole statement.
You can use +=
and -=
operators in SET
statements (like increments within loops, etc.) as you can in many other languages:
SET @TotalRowInserted = @TotalRowInserted + @rowCount; ... SET @chunkCount = @chunkCount + 1;
Become simply:
SET @TotalRowInserted += @rowCount;
...
SET @chunkCount += 1;
Performance
For the most part, this is pretty straightforward and there may not be huge performance gains to be made, but I spotted a few things that potentially could help some.*
* Your mileage may vary.
Transaction isolation level
There is no declared isolation level in your procedure. How could it impact performance? Well, it depends. The database probably has a default isolation level for read operations, you should definitely check that.
You may or may not be worried about uncommitted reads (a.k.a. "ghost reads"), and this would primarily be for the INSERT INTO @NextIDs
as far as I can tell, but it could alleviate potential locks on the production database.
Reference: SET TRANSACTION ISOLATION LEVEL - MSDN
Another small improvement might be to SET NOCOUNT ON
for the whole procedure, rather than just at the end in your CATCH
block.
"Physical" temporary table
This:
DECLARE @NextIDs TABLE( cClient nvarchar(8), iRunNo int, UNIQUE NONCLUSTERED (cClient, iRunNo) );
... might be better off as a "physical" #NextIDs
temp table on disk, instead of an in-memory table. This of course can vary depending on the memory and I/O capabilities of the server(s), as well as which version of SQL Server you are using and so on. I can just say that from my own experience at least, #TempTable
are almost always faster than @TempTable
when used for any non-trivial amount of records. This also lets you control indexing and things like that, while the temp table exists.
Some references on the topic:
- Special Table Types - Microsoft TechNet
This one is really good:
Are Table Variables as Good as Temporary Tables in SQL 2014? - Kendra Little, BrentOzar.com
Optimizing Performance / Indexes on Temp Tables - Bill Richmond, SQLTeam.com
Ask a DBA!
Database Administrators have some unique and privileged insights in regards to database performance, specifications and such. In addition to consulting your internal DBA(s), there is also a Database Administrators Stack Exchange site where you could get more advanced advice (although this question as it is would not be good there - read their FAQ What topics can I ask about here? and What types of questions should I avoid asking? if you decide to ask a question there.
-
1\$\begingroup\$ As an extra temporary tables support indexes, we don't have enough info on the table definitions and execution plans to say anything about efficiency but that could be an added bonus, that could be emphasized more \$\endgroup\$Tom V– Tom V2016年03月16日 22:46:01 +00:00Commented Mar 16, 2016 at 22:46
-
\$\begingroup\$ @PinCrash It was a great and thorough analysis of the procedure and can't thank you enough for all the points you have covered. Just a couple of questions I had in mind: \$\endgroup\$Aman S. Aneja– Aman S. Aneja2016年03月18日 08:45:35 +00:00Commented Mar 18, 2016 at 8:45
-
\$\begingroup\$ @PinCrash It was a great & thorough analysis of the procedure and can't thank you enough for all the points covered. Thanks for pointing out
SET NOCOUNT ON
My bad. Just a couple of questions I had in mind: 1. What do you mean when you say, * Your mileage may vary? 2. What database should be in use when these #Tables should be created? 3. Any specific suggestion regarding deletion of the records? Also, now while inserting in archive select statements we now usewith(nolock)
hint. This is because we are dealing with history records and should'nt cause dirty reads. \$\endgroup\$Aman S. Aneja– Aman S. Aneja2016年03月18日 09:37:49 +00:00Commented Mar 18, 2016 at 9:37 -
\$\begingroup\$ (1) What I meant by that is some database setups may benefit from some of those things, while some others may not, depending on a whole range of different factors that make each individual database server different from others \$\endgroup\$Phrancis– Phrancis2016年03月18日 10:53:15 +00:00Commented Mar 18, 2016 at 10:53
-
\$\begingroup\$ (2) When I have used #TempTables in queries that involved linked servers, I have created them on the source or "starting" server, but those were very different type of queries (running selects in a cursor over N linked servers to gather some data from each to a central location). In your case, I think it might be better to create them on the archive server as it doesn't seem as "critical" for it to have no performance slowdowns while archiving, although I would suggest to test both. \$\endgroup\$Phrancis– Phrancis2016年03月18日 10:57:21 +00:00Commented Mar 18, 2016 at 10:57
Explore related questions
See similar questions with these tags.