Update Trigger (Comparing Descriptive string), as well as checking for valid updates

Question 1

I have an audit table EpisodesHistory. One of the fields that gets copied is eDescription. I expect eDescription to be regularly be a couple kilobytes in length.

For database space and transmission to client applications, I wanted to do something where if, upon recording the change, eDescription is identical to a previous iteration, it records that value.

Let's say I have this EpisodesHistory table:

HistoryID(PK) | EpisodeID | eDescription | ... other fields
1 10 Watch my dog do a trick
2 10 Watch my cat sleep
3 11 Watch my dog do a trick

In the case of row 3, it was inserted because the episodeID is different.

However, if I then added:

HistoryID(PK) | EpisodeID | eDescription | ... other fields`
4 10 Watch my dog do a trick

I wanted to auto-update an additional field EpisodesHistory.eDescRef to the ID of the row containing the same description. 1 in this case because 1 is the first row to have this field.

It took me a few attempts to get something working so I want to make sure that this is the optimum method as this will be one of the busier triggers in my database.

USE [Shows]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER TRIGGER [dbo].[tr-EpisodesHistory]
on [dbo].[Episodes]
FOR UPDATE
AS
IF TRIGGER_NESTLEVEL(OBJECT_ID('dbo.EpisodesHistory')) > 1
 BEGIN
 RETURN;
 END;
DECLARE @CMP INTEGER
DECLARE @NEWID INTEGER = 0
IF NOT UPDATE(eActor) OR NOT EXISTS((SELECT *, eActor = 0 FROM Inserted except SELECT *, eActor = 0 FROM Deleted))
BEGIN
 -- Weed out updates to episode scoring, or where nothing that matters in the row was changed (eActor doesn't matter)
 -- eVersion is incremented code-side, so this has to be undone.
 -- Could be incremented here instead, but thats yet another query for successful updates
 UPDATE Episodes
 SET eVersion = eVersion - 1
 WHERE EpisodeID = (SELECT EpisodeID From Inserted)
 return;
END
SET @CMP = (Select Top 1 Past.HISTORYID
 FROM DELETED Curr, EpisodesHistory Past
 where Curr.EpisodeID = Past.EpisodeID
 and Curr.eOwner = Past.eOwner
 and Curr.eDescription = Past.eDescription)
SET @CMP = ISNULL(@CMP, 0)
INSERT into EpisodesHistory
SELECT *, eDescRef = @CMP, hWhen = getDate() from deleted;
SET @NEWID = SCOPE_IDENTITY()
IF @CMP > 0
BEGIN
 UPDATE EpisodesHistory
 SET eDescRef = @CMP,
 eDescription = ''
 WHERE HistoryID = @NEWID
END

Question 2

It will only work as intended for single-row updates

The fact that you use TOP 1 and SCOPE_IDENTITY() tells me this was designed with one-row updates/inserts in mind. But what happens when you update multiple rows? (In case you don't know, the trigger only fires once for the whole changed set, rather than once for each row. The changed rows are added to the inserted and deleted tables).

To solve this, I'd recommend using a temp table or variable table instead of @CMP, and the output clause instead of @NEWID. For example:

-- Example table with identity
create table Things (KeyId int identity(1, 1), Value varchar(31))`
-- Table to store identity ints which get created
declare @NewIds table (NewKeyId int)
insert into Things (Value)
output inserted.KeyId into @Keys (NewKeyId) -- this line gets the new identities
select 'Value text'
from OtherTable

Now you can access the created IDs using @Keys. Of course you'd want to store the EpisodeId or something in the same table, to relate them.

EDIT: Untested replacement

So I've just roughly written some SQL I think should work instead of everything including and below the SELECT Top 1 statement. The main issue I have with that part of your trigger is that it will only work for one-row updates, and it performs an INSERT then an UPDATE on that inserted data, which can be optimised.

INSERT into EpisodeHistory
SELECT
 EpisodeID = d.EpisodeID,
 eOwner = d.eOwner,
 eDescription = iif(h.HistoryID is null, d.eDescription, ''),
 eDescRef = isnull(h.HistoryID, 0),
 hWhen = getDate(),
from deleted d
 left join (
 select
 HistoryID, EpisodeID, eOwner, eDescription,
 row_number() over (partition by EpisodeID, eOwner, eDescription ORDER BY HistoryID) as RowNumber
 from EpisodeHistory
 where eDescription <> ''
 ) h
 on d.EpisodeID = h.EpisodeID
 and d.eOwner = h.eOwner
 and d.eDescription = h.eDescription
 and h.RowNumber = 1

Rather than inserting the EpisodeHistory, then checking and updating based on the data we've just inserted, this code just does the operation all in one.

First I'll explain the left join. The point of the join is to find the HistoryID of the earliest matching EpisodeHistory row (since we want to put it in eDescRef, right?). So we do a normal select of the EpisodeHistory table, except we also add a column for row_number(), which, when combined with the fact that we've put RowNumber = 1 in the join condition, will means it will only join to the first HistoryID that matches. This makes sure that if there are two rows with the same EpisodeID, eOwner and eDescription, it will only join to the one with the lowest HistoryID.

Now the actual columns that are inserted are fairly simple. If the left join fails (if no rows in EpisodeHistory match), selecting h.HistoryID will return NULL. Therefore, eDescription = iif(h.HistoryID is null, d.eDescription, '') means that if a HistoryID was not found in the join, it will just do a normal insert of eDescription. If one was found, it will blank that field. The next line is fairly simple in that it inserts HistoryID if the join happened, otherwise it puts in 0.

Since the entire operation can be done in one statement like above, so you can ignore that bit I said above about temporary tables and an output clause.

This code above also has the added bonus of working for multiple-row updates.

It's a trigger...

You may already know, but the SQL Server community is fairly divided when it comes to whether triggers should be avoided or not. They can be easy to forget, hard to debug, and can hinder performance as they become part of the write operation.

Since you mention performance I'll elaborate on that aspect, and I'll make some assumptions I hope you can forgive. I'm going to assume there's an application that sits in front of this database, and the application prompts the insert/update. Say the user is on a form and makes changes to an episode, and submits the change. Usually the application will wait for confirmation of the update/insert operation before proceeding (before loading the next page, for instance). With the trigger, the application now has to wait for the initial insert/update, as well as the trigger to complete its operation too.

Therefore, if you haven't already, consider the possibility of separating out this process. In this example I gave above, you could have it so that the application still waits for the initial insert/update operations, loading the next page normally, but then in the background (asynchronously) it tells the database to perform audit operations. Boom, you've reduced page load time, better UX. This also has the benefit of you not having to worry so much about performance tuning the auditing operation.

Checksum?

I'm going to assume since performance is an issue that you've considered indexes, and have an index set up to optimise this Episode and EpisodeHistory comparison. Have you considered the use of CHECKSUM()? I don't have any experience with it myself, however I'd imagine you could use it in combination with persistent computed columns to improve the load when write and maybe reading to the indexes.

In this instance you'd have an extra column on both tables called something like eDescriptionChecksum, which would be a checksum of eDescription, and you'd put that column in the index instead of eDescription. Then when you perform the comparison to see if they're the same, you'd be comparing the relatively small and light int checksum column instead of the potentially-huge (n)varchar column.

Minor issue

I see when you set the @CMP, you select from multiple tables. It's recommended to use joins instead – use of multiple tables in the from clause is being depreciated by all DBMSs.

If anything's unclear, let me know. Hope you found this helpful.

Question 3

Wow, this is a lot to review. Thanks!. I'm above novice with SQL but I'm hesitant to call myself intermediate. I use TOP 1 hopefully to get the table to stop searching once it finds a match. // I felt like the alternative was to do a code-side select against Episodes, and against EpisodesHistory (or a join operation, probably) as well as one of two different types of inserts based on where this is a match. // I can't see how code-side interactions could be faster for the DBMS, but I could pass those to a thread and return the success-notification to the user faster.--

Question 4

--I'm not sure if SQL can toss to a separate thread like that, however I'm not worried about the per-user performance (a quarter second or a full second [not literal numbers] doesn't matter, it's a big operation // I haven't looked into CHECKSUM(), I will, and thanks, I'll update the "minor issue" to a proper JOIN. Excellent answer, my friend.

Question 5

Glad you found it helpful. I've just added a section about some replacement code to address the issue with the trigger only working for one-row updates. What I mean by this is if someone performs an update statement that affects more than one row, the trigger is not run each time for each row updated, it is run once, and the updated rows are passed in all at once using the deleted and inserted tables. If you've got any questions about that code or you don't understand it, let me know.

Question 6

So, for your own reference. According to my research, CHECKSUM and -_BINARY is ripe with conflicts in small datasets (as little as a few hundred rows). HASHBYTES appears superior there. // I can't thank you enough for the very detailed help // Episodes and EpisodesHistory are identical except for audit related data (HistoryID, hWhen, eDescRef).. SELECT *, hWhen... eDescRef... should still be safe right?. Like you said, it's a trigger and easy to forget. Don't worry, I know not to use SELECT * in 99% of settings, but an audit trigger seems ideal. -Again, thank you!-

Question 7

Thanks for the heads up about CHECKSUM and HASHBYES – I've never used either but thought they could come in handy for comparing huge text fields. These will definitely come in handy one day. // SELECT * will be safe, up until the point where you perhaps change/add a column to Episodes without reflecting those changes in EpisodeHistory of course.

Kris Lawton Kris Lawton 1509 bronze badges · Accepted Answer · 2018-01-22 19:45:56Z

It will only work as intended for single-row updates

The fact that you use TOP 1 and SCOPE_IDENTITY() tells me this was designed with one-row updates/inserts in mind. But what happens when you update multiple rows? (In case you don't know, the trigger only fires once for the whole changed set, rather than once for each row. The changed rows are added to the inserted and deleted tables).

To solve this, I'd recommend using a temp table or variable table instead of @CMP, and the output clause instead of @NEWID. For example:

-- Example table with identity
create table Things (KeyId int identity(1, 1), Value varchar(31))`
-- Table to store identity ints which get created
declare @NewIds table (NewKeyId int)
insert into Things (Value)
output inserted.KeyId into @Keys (NewKeyId) -- this line gets the new identities
select 'Value text'
from OtherTable

Now you can access the created IDs using @Keys. Of course you'd want to store the EpisodeId or something in the same table, to relate them.

EDIT: Untested replacement

So I've just roughly written some SQL I think should work instead of everything including and below the SELECT Top 1 statement. The main issue I have with that part of your trigger is that it will only work for one-row updates, and it performs an INSERT then an UPDATE on that inserted data, which can be optimised.

INSERT into EpisodeHistory
SELECT
 EpisodeID = d.EpisodeID,
 eOwner = d.eOwner,
 eDescription = iif(h.HistoryID is null, d.eDescription, ''),
 eDescRef = isnull(h.HistoryID, 0),
 hWhen = getDate(),
from deleted d
 left join (
 select
 HistoryID, EpisodeID, eOwner, eDescription,
 row_number() over (partition by EpisodeID, eOwner, eDescription ORDER BY HistoryID) as RowNumber
 from EpisodeHistory
 where eDescription <> ''
 ) h
 on d.EpisodeID = h.EpisodeID
 and d.eOwner = h.eOwner
 and d.eDescription = h.eDescription
 and h.RowNumber = 1

Rather than inserting the EpisodeHistory, then checking and updating based on the data we've just inserted, this code just does the operation all in one.

First I'll explain the left join. The point of the join is to find the HistoryID of the earliest matching EpisodeHistory row (since we want to put it in eDescRef, right?). So we do a normal select of the EpisodeHistory table, except we also add a column for row_number(), which, when combined with the fact that we've put RowNumber = 1 in the join condition, will means it will only join to the first HistoryID that matches. This makes sure that if there are two rows with the same EpisodeID, eOwner and eDescription, it will only join to the one with the lowest HistoryID.

Now the actual columns that are inserted are fairly simple. If the left join fails (if no rows in EpisodeHistory match), selecting h.HistoryID will return NULL. Therefore, eDescription = iif(h.HistoryID is null, d.eDescription, '') means that if a HistoryID was not found in the join, it will just do a normal insert of eDescription. If one was found, it will blank that field. The next line is fairly simple in that it inserts HistoryID if the join happened, otherwise it puts in 0.

Since the entire operation can be done in one statement like above, so you can ignore that bit I said above about temporary tables and an output clause.

This code above also has the added bonus of working for multiple-row updates.

It's a trigger...

You may already know, but the SQL Server community is fairly divided when it comes to whether triggers should be avoided or not. They can be easy to forget, hard to debug, and can hinder performance as they become part of the write operation.

Since you mention performance I'll elaborate on that aspect, and I'll make some assumptions I hope you can forgive. I'm going to assume there's an application that sits in front of this database, and the application prompts the insert/update. Say the user is on a form and makes changes to an episode, and submits the change. Usually the application will wait for confirmation of the update/insert operation before proceeding (before loading the next page, for instance). With the trigger, the application now has to wait for the initial insert/update, as well as the trigger to complete its operation too.

Therefore, if you haven't already, consider the possibility of separating out this process. In this example I gave above, you could have it so that the application still waits for the initial insert/update operations, loading the next page normally, but then in the background (asynchronously) it tells the database to perform audit operations. Boom, you've reduced page load time, better UX. This also has the benefit of you not having to worry so much about performance tuning the auditing operation.

Checksum?

I'm going to assume since performance is an issue that you've considered indexes, and have an index set up to optimise this Episode and EpisodeHistory comparison. Have you considered the use of CHECKSUM()? I don't have any experience with it myself, however I'd imagine you could use it in combination with persistent computed columns to improve the load when write and maybe reading to the indexes.

In this instance you'd have an extra column on both tables called something like eDescriptionChecksum, which would be a checksum of eDescription, and you'd put that column in the index instead of eDescription. Then when you perform the comparison to see if they're the same, you'd be comparing the relatively small and light int checksum column instead of the potentially-huge (n)varchar column.

Minor issue

I see when you set the @CMP, you select from multiple tables. It's recommended to use joins instead – use of multiple tables in the from clause is being depreciated by all DBMSs.

If anything's unclear, let me know. Hope you found this helpful.

Wow, this is a lot to review. Thanks!. I'm above novice with SQL but I'm hesitant to call myself intermediate. I use TOP 1 hopefully to get the table to stop searching once it finds a match. // I felt like the alternative was to do a code-side select against Episodes, and against EpisodesHistory (or a join operation, probably) as well as one of two different types of inserts based on where this is a match. // I can't see how code-side interactions could be faster for the DBMS, but I could pass those to a thread and return the success-notification to the user faster.--
--I'm not sure if SQL can toss to a separate thread like that, however I'm not worried about the per-user performance (a quarter second or a full second [not literal numbers] doesn't matter, it's a big operation // I haven't looked into CHECKSUM(), I will, and thanks, I'll update the "minor issue" to a proper JOIN. Excellent answer, my friend.
Glad you found it helpful. I've just added a section about some replacement code to address the issue with the trigger only working for one-row updates. What I mean by this is if someone performs an update statement that affects more than one row, the trigger is not run each time for each row updated, it is run once, and the updated rows are passed in all at once using the deleted and inserted tables. If you've got any questions about that code or you don't understand it, let me know.
So, for your own reference. According to my research, CHECKSUM and -_BINARY is ripe with conflicts in small datasets (as little as a few hundred rows). HASHBYTES appears superior there. // I can't thank you enough for the very detailed help // Episodes and EpisodesHistory are identical except for audit related data (HistoryID, hWhen, eDescRef).. SELECT *, hWhen... eDescRef... should still be safe right?. Like you said, it's a trigger and easy to forget. Don't worry, I know not to use SELECT * in 99% of settings, but an audit trigger seems ideal. -Again, thank you!-
Thanks for the heads up about CHECKSUM and HASHBYES – I've never used either but thought they could come in handy for comparing huge text fields. These will definitely come in handy one day. // SELECT * will be safe, up until the point where you perhaps change/add a column to Episodes without reflecting those changes in EpisodeHistory of course.

Stack Exchange Network

Update Trigger (Comparing Descriptive string), as well as checking for valid updates

1 Answer 1

It will only work as intended for single-row updates

EDIT: Untested replacement

It's a trigger...

Checksum?

Minor issue

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Update Trigger (Comparing Descriptive string), as well as checking for valid updates

1 Answer 1

It will only work as intended for single-row updates

EDIT: Untested replacement

It's a trigger...

Checksum?

Minor issue

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions