I have an audit table EpisodesHistory
. One of the fields that gets copied is eDescription
. I expect eDescription
to be regularly be a couple kilobytes in length.
For database space and transmission to client applications, I wanted to do something where if, upon recording the change, eDescription
is identical to a previous iteration, it records that value.
Let's say I have this EpisodesHistory
table:
HistoryID(PK) | EpisodeID | eDescription | ... other fields
1 10 Watch my dog do a trick
2 10 Watch my cat sleep
3 11 Watch my dog do a trick
In the case of row 3, it was inserted because the episodeID
is different.
However, if I then added:
HistoryID(PK) | EpisodeID | eDescription | ... other fields`
4 10 Watch my dog do a trick
I wanted to auto-update an additional field EpisodesHistory.eDescRef
to the ID of the row containing the same description. 1
in this case because 1 is the first row to have this field.
It took me a few attempts to get something working so I want to make sure that this is the optimum method as this will be one of the busier triggers in my database.
USE [Shows]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER TRIGGER [dbo].[tr-EpisodesHistory]
on [dbo].[Episodes]
FOR UPDATE
AS
IF TRIGGER_NESTLEVEL(OBJECT_ID('dbo.EpisodesHistory')) > 1
BEGIN
RETURN;
END;
DECLARE @CMP INTEGER
DECLARE @NEWID INTEGER = 0
IF NOT UPDATE(eActor) OR NOT EXISTS((SELECT *, eActor = 0 FROM Inserted except SELECT *, eActor = 0 FROM Deleted))
BEGIN
-- Weed out updates to episode scoring, or where nothing that matters in the row was changed (eActor doesn't matter)
-- eVersion is incremented code-side, so this has to be undone.
-- Could be incremented here instead, but thats yet another query for successful updates
UPDATE Episodes
SET eVersion = eVersion - 1
WHERE EpisodeID = (SELECT EpisodeID From Inserted)
return;
END
SET @CMP = (Select Top 1 Past.HISTORYID
FROM DELETED Curr, EpisodesHistory Past
where Curr.EpisodeID = Past.EpisodeID
and Curr.eOwner = Past.eOwner
and Curr.eDescription = Past.eDescription)
SET @CMP = ISNULL(@CMP, 0)
INSERT into EpisodesHistory
SELECT *, eDescRef = @CMP, hWhen = getDate() from deleted;
SET @NEWID = SCOPE_IDENTITY()
IF @CMP > 0
BEGIN
UPDATE EpisodesHistory
SET eDescRef = @CMP,
eDescription = ''
WHERE HistoryID = @NEWID
END
1 Answer 1
It will only work as intended for single-row updates
The fact that you use TOP 1
and SCOPE_IDENTITY()
tells me this was designed with one-row updates/inserts in mind. But what happens when you update multiple rows? (In case you don't know, the trigger only fires once for the whole changed set, rather than once for each row. The changed rows are added to the inserted
and deleted
tables).
To solve this, I'd recommend using a temp table or variable table instead of @CMP
, and the output
clause instead of @NEWID
. For example:
-- Example table with identity
create table Things (KeyId int identity(1, 1), Value varchar(31))`
-- Table to store identity ints which get created
declare @NewIds table (NewKeyId int)
insert into Things (Value)
output inserted.KeyId into @Keys (NewKeyId) -- this line gets the new identities
select 'Value text'
from OtherTable
Now you can access the created IDs using @Keys
. Of course you'd want to store the EpisodeId
or something in the same table, to relate them.
EDIT: Untested replacement
So I've just roughly written some SQL I think should work instead of everything including and below the SELECT Top 1
statement. The main issue I have with that part of your trigger is that it will only work for one-row updates, and it performs an INSERT then an UPDATE on that inserted data, which can be optimised.
INSERT into EpisodeHistory
SELECT
EpisodeID = d.EpisodeID,
eOwner = d.eOwner,
eDescription = iif(h.HistoryID is null, d.eDescription, ''),
eDescRef = isnull(h.HistoryID, 0),
hWhen = getDate(),
from deleted d
left join (
select
HistoryID, EpisodeID, eOwner, eDescription,
row_number() over (partition by EpisodeID, eOwner, eDescription ORDER BY HistoryID) as RowNumber
from EpisodeHistory
where eDescription <> ''
) h
on d.EpisodeID = h.EpisodeID
and d.eOwner = h.eOwner
and d.eDescription = h.eDescription
and h.RowNumber = 1
Rather than inserting the EpisodeHistory, then checking and updating based on the data we've just inserted, this code just does the operation all in one.
First I'll explain the left join. The point of the join is to find the HistoryID
of the earliest matching EpisodeHistory
row (since we want to put it in eDescRef
, right?). So we do a normal select of the EpisodeHistory
table, except we also add a column for row_number()
, which, when combined with the fact that we've put RowNumber = 1
in the join condition, will means it will only join to the first HistoryID
that matches. This makes sure that if there are two rows with the same EpisodeID
, eOwner
and eDescription
, it will only join to the one with the lowest HistoryID
.
Now the actual columns that are inserted are fairly simple. If the left join fails (if no rows in EpisodeHistory match), selecting h.HistoryID
will return NULL
. Therefore, eDescription = iif(h.HistoryID is null, d.eDescription, '')
means that if a HistoryID
was not found in the join, it will just do a normal insert of eDescription
. If one was found, it will blank that field. The next line is fairly simple in that it inserts HistoryID
if the join happened, otherwise it puts in 0.
Since the entire operation can be done in one statement like above, so you can ignore that bit I said above about temporary tables and an output
clause.
This code above also has the added bonus of working for multiple-row updates.
It's a trigger...
You may already know, but the SQL Server community is fairly divided when it comes to whether triggers should be avoided or not. They can be easy to forget, hard to debug, and can hinder performance as they become part of the write operation.
Since you mention performance I'll elaborate on that aspect, and I'll make some assumptions I hope you can forgive. I'm going to assume there's an application that sits in front of this database, and the application prompts the insert/update. Say the user is on a form and makes changes to an episode, and submits the change. Usually the application will wait for confirmation of the update/insert operation before proceeding (before loading the next page, for instance). With the trigger, the application now has to wait for the initial insert/update, as well as the trigger to complete its operation too.
Therefore, if you haven't already, consider the possibility of separating out this process. In this example I gave above, you could have it so that the application still waits for the initial insert/update operations, loading the next page normally, but then in the background (asynchronously) it tells the database to perform audit operations. Boom, you've reduced page load time, better UX. This also has the benefit of you not having to worry so much about performance tuning the auditing operation.
Checksum?
I'm going to assume since performance is an issue that you've considered indexes, and have an index set up to optimise this Episode
and EpisodeHistory
comparison. Have you considered the use of CHECKSUM()
? I don't have any experience with it myself, however I'd imagine you could use it in combination with persistent computed columns to improve the load when write and maybe reading to the indexes.
In this instance you'd have an extra column on both tables called something like eDescriptionChecksum
, which would be a checksum of eDescription, and you'd put that column in the index instead of eDescription
. Then when you perform the comparison to see if they're the same, you'd be comparing the relatively small and light int
checksum column instead of the potentially-huge (n)varchar
column.
Minor issue
I see when you set the @CMP
, you select from multiple tables. It's recommended to use joins instead – use of multiple tables in the from
clause is being depreciated by all DBMSs.
If anything's unclear, let me know. Hope you found this helpful.
-
\$\begingroup\$ Wow, this is a lot to review. Thanks!. I'm above novice with SQL but I'm hesitant to call myself intermediate. I use
TOP 1
hopefully to get the table to stop searching once it finds a match. // I felt like the alternative was to do a code-side select againstEpisodes
, and againstEpisodesHistory
(or a join operation, probably) as well as one of two different types of inserts based on where this is a match. // I can't see how code-side interactions could be faster for the DBMS, but I could pass those to a thread and return the success-notification to the user faster.-- \$\endgroup\$Regular Jo– Regular Jo2018年01月22日 20:16:14 +00:00Commented Jan 22, 2018 at 20:16 -
\$\begingroup\$ --I'm not sure if SQL can toss to a separate thread like that, however I'm not worried about the per-user performance (a quarter second or a full second [not literal numbers] doesn't matter, it's a big operation // I haven't looked into
CHECKSUM()
, I will, and thanks, I'll update the "minor issue" to a proper JOIN. Excellent answer, my friend. \$\endgroup\$Regular Jo– Regular Jo2018年01月22日 20:19:39 +00:00Commented Jan 22, 2018 at 20:19 -
\$\begingroup\$ Glad you found it helpful. I've just added a section about some replacement code to address the issue with the trigger only working for one-row updates. What I mean by this is if someone performs an update statement that affects more than one row, the trigger is not run each time for each row updated, it is run once, and the updated rows are passed in all at once using the deleted and inserted tables. If you've got any questions about that code or you don't understand it, let me know. \$\endgroup\$Kris Lawton– Kris Lawton2018年01月23日 12:54:39 +00:00Commented Jan 23, 2018 at 12:54
-
1\$\begingroup\$ So, for your own reference. According to my research, CHECKSUM and -_BINARY is ripe with conflicts in small datasets (as little as a few hundred rows). HASHBYTES appears superior there. // I can't thank you enough for the very detailed help //
Episodes
andEpisodesHistory
are identical except for audit related data(HistoryID, hWhen, eDescRef)
..SELECT *, hWhen... eDescRef...
should still be safe right?. Like you said, it's a trigger and easy to forget. Don't worry, I know not to use SELECT * in 99% of settings, but an audit trigger seems ideal. -Again, thank you!- \$\endgroup\$Regular Jo– Regular Jo2018年01月23日 19:41:01 +00:00Commented Jan 23, 2018 at 19:41 -
\$\begingroup\$ Thanks for the heads up about CHECKSUM and HASHBYES – I've never used either but thought they could come in handy for comparing huge text fields. These will definitely come in handy one day. //
SELECT *
will be safe, up until the point where you perhaps change/add a column toEpisodes
without reflecting those changes inEpisodeHistory
of course. \$\endgroup\$Kris Lawton– Kris Lawton2018年01月24日 09:41:42 +00:00Commented Jan 24, 2018 at 9:41