We have a SQL Server solution that has a table dsStaging.Audit
that stores audit records created by a third party transactional database. We use these audits to synchronise CRUD operations from the third party system into our SQL database.
CREATE TABLE [dsStaging].[Audit](
[SyncExecutionId] [bigint] NOT NULL,
[AuditDataGuid] [nvarchar](56) NOT NULL,
[AuditDate] [datetime] NOT NULL,
[AuditDateTimeZone] [datetimeoffset](3) NULL,
[AuditEventGroup] [nvarchar](56) NOT NULL,
[TransactionId] [bigint] NOT NULL,
[TransactionSequence] [int] NOT NULL,
.
...
.
CONSTRAINT [PK_Audit] PRIMARY KEY CLUSTERED
(
[SyncExecutionId] ASC,
[TransactionId] ASC,
[TransactionSequence] ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
)
When the audits are processed, I want to move the audit records into a separate table Processed.Audit
, ready for deletion after x days.
CREATE TABLE [Processed].[Audit](
[SyncExecutionId] [bigint] NOT NULL,
[AuditDataGuid] [nvarchar](56) NOT NULL,
[AuditDate] [datetime] NOT NULL,
[AuditEventGroup] [nvarchar](56) NOT NULL,
[TransactionId] [bigint] NOT NULL,
[TransactionSequence] [int] NOT NULL,
.
...
.
CONSTRAINT [PK_Processed_Audit] PRIMARY KEY NONCLUSTERED
(
[AuditDate] ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
)
My main aim in moving audits out of staging and into processed is performance. I need to make sure that the staging table is locked for as short a time as possible, so that any unprocessed audits can be processed as quickly as possible (fewer audits in the staging table = much faster processing).
We're looking at around 1.5m audit records going through this process every hour in batches of about 10k.
The process for moving audits will fire roughly every 20-30 seconds. The process for deleting Processed.Audit
records will execute every hour and delete 1 hour's worth of audits from X days ago (typically around 7 days).
- Should I turn the
Processed.Audit
table into a clustered index?
Minimum version to be supported: SQL Server 2012 Standard Edition.
2 Answers 2
The main reason I would want a clustered index in this scenario is this line:
The process for deleting process.Audit records will execute every hour and delete an hours worth of audits from x days ago (typically around 7 days)
When you delete rows from a HEAP, data pages may not be deallocated unless the delete gets a table lock, or you provide a WITH (TABLOCK)
hint to the delete query. You can probably imagine what that does to concurrency, though. Not good.
Note that the TABLOCK
hint will not have this behavior if you're using RCSI or Snapshot Isolation.
Here's a quick example. Load up a small table:
USE tempdb;
SET NOCOUNT ON;
CREATE TABLE dbo.heap
(
id INT PRIMARY KEY NONCLUSTERED,
junk VARCHAR(1000)
);
INSERT dbo.heap (
id, junk )
SELECT TOP 1000 x.n, REPLICATE('A', x.n % 1000)
FROM (
SELECT ROW_NUMBER() OVER ( ORDER BY @@ROWCOUNT ) AS n
FROM sys.messages AS m ) AS x;
Run a sanity check query to figure out how many pages are assigned to the heap, and to the nonclustered PK:
SELECT OBJECT_NAME(i.object_id) AS table_name,
i.name AS index_name,
MAX(a.used_pages) AS leaf_me_alone
FROM sys.indexes AS i
JOIN sys.partitions AS p
ON p.object_id = i.object_id
AND p.index_id = i.index_id
JOIN sys.allocation_units AS a
ON a.container_id = p.partition_id
WHERE OBJECT_NAME(i.object_id) = 'heap'
GROUP BY i.object_id, i.index_id, i.name
ORDER BY OBJECT_NAME(i.object_id), i.index_id;
Results in this:
table_name index_name leaf_me_alone
heap NULL 74
heap PK__heap__ 7
So, 74 pages in the heap, 7 pages in the NC PK.
Do some singleton deletes to clear out the table:
DECLARE @i INT = 1;
WHILE @i < 1000
BEGIN
DELETE h
FROM dbo.heap AS h
WHERE h.id = @i;
SET @i += 1;
PRINT @i;
END;
If you re-run the sanity check query, you'll get the same result.
Worse, if you query the table now, SQL will read ALL OF THOSE BLANK PAGES!
SET STATISTICS TIME, IO ON
SELECT *
FROM dbo.heap AS h;
Table 'heap'. Scan count 1, logical reads 67
So now not only is our table artificially large, but SQL now has a bunch of blank pages on disk and in memory and in backups and in DBCC CHECKDB and... well, you get the point.
We're looking at around 1.5m audit records going through this process every hour
Heh heh heh! No fun.
Other options for getting pages deallocated from the heap are:
TRUNCATE TABLE dbo.heap
Which doesn't work for you, because you need to batch delete data.
ALTER TABLE dbo.heap REBUILD;
Which would be painful for you at that table size, because it will rebuild all nonclustered indexes on the table at the same time.
Will the table re-use pages? Sometimes maybe sorta kinda.
DECLARE @id_max INT = (SELECT MAX(id) FROM dbo.heap AS h);
INSERT dbo.heap (
id, junk )
SELECT TOP 5000 x.n + @id_max, REPLICATE('A', x.n % 1000)
FROM (
SELECT ROW_NUMBER() OVER ( ORDER BY @@ROWCOUNT ) AS n
FROM sys.messages AS m ) AS x;
Sanity check:
table_name index_name leaf_me_alone
heap NULL 400
heap PK__heap__ 20
SELECT * query:
Table 'heap'. Scan count 1, logical reads 392
Hope this helps!
I would do it differently:
- Table
dsStaging.Audit
without any constraints or indexes. - Move the records to middle table by switch partition or
sp_rename
. - Create index on the middle table to support the deleting process.
- Move data from middle table into process Audit table. (delete into is recommended).
- Drop middle table. The END.
I would consider to partition the process.Audit
table and from middle table using switch partition feature.
-
1Noting that partitioning is not supported by SQL Server 2012 Standard Edition (not included in Standard Edition until 2016 SP1)mendosi– mendosi2017年09月19日 06:57:56 +00:00Commented Sep 19, 2017 at 6:57
-
Thanks I know but rename is still an option.itzik Paz– itzik Paz2017年09月19日 07:08:22 +00:00Commented Sep 19, 2017 at 7:08
-
@itzikPaz thanks for the suggestion - can you give some reasoning for doing it this way over the methods discussed thus far? btw: I'd prefer not to remove any indices or constraints from dsStaging.Audit because there are two processes that rely on them; the load of records into dsStaging.Audit and the processing of audits in dsStaging.Audit.Drammy– Drammy2017年09月20日 12:30:58 +00:00Commented Sep 20, 2017 at 12:30
-
The main reason is . Any constraints or indexes in table dsstaging.Audit will impact badly on inserting statement .itzik Paz– itzik Paz2017年09月22日 07:12:50 +00:00Commented Sep 22, 2017 at 7:12
Explore related questions
See similar questions with these tags.