How to break out of error loop in transactional replication?

Question 1

I was running some simple tests on a publisher and subscriber on the same instance:

I inserted 50,000 rows into a published article and the data was correctly pushed to the subscriber.
I inadvertently deleted the 49,985th record on the subscriber (but didn't realize it at the time)
I deleted 50,000 rows from the published article

As I was watching the size of the subscriber table, I noticed it would drop down in row count almost to zero and then go back to 50,000. Deletes would start running again, the row count would drop, and then it back to 50,000. This kept happening over and over.

I ran a standard trace and saw all the deletes running normally. After it was almost complete, this statement appeared:

IF @@TRANCOUNT > 0 ROLLBACK

Then I modified the trace to incude error messages and saw this:

The row was not found at the Subscriber when applying the replicated command.

So it looks like when the distribution agent is executing all the deletes one-by-one, the process is still included in a transaction (the original DELETE was one line). If there is an error, it performs a ROLLBACK and then starts over.

My questions:

Is there a standard way to break out of the loop?
Would it be acceptable to recreate the missing row so the process can complete?
What do you do if this happens on production? I guess that's not really a question. But this seems like a simple error could have some pretty serious complications unless replication is monitored very carefully.

Question 2

If you have the missing row, that seems like the easiest way out in this case.

Question 3

I agree. For the future, I've modified the replication procedures so the error is logged. But the agent's attempt to "keep trying" seems like the equivalent of a service broker poison message. I don't see what is gained from not stopping the service.

Question 4

Transactions not applied at subscribers can be removed from the distribution database to stop it from retrying (because then there's nothing to retry), but you would have to manually propagate the changes to the subscribers... in this case that's probably not advisable vs. the one-row fix.

Question 5

The error is caused by the code in the custom delete stored procedure for each article:

create procedure [dbo].[sp_MSdel_dboChild]
 @pkc1 int
as
begin 
 delete [dbo].[Child]
where [ChildID] = @pkc1
if @@rowcount = 0
 if @@microsoftversion>0x07320000
 exec sp_MSreplraiserror 20598
end

If no row is deleted, a special procedure is called and the transaction is rolled back.

In order to avoid this, I've created a post-Snapshot script that alters the all delete / update replication stored procedures and logs the error to a table instead.

CREATE procedure [dbo].[sp_MSdel_dboChild]
 @pkc1 int
as
begin 
 delete [dbo].[Child]
where [ChildID] = @pkc1 
if @@rowcount = 0
 if @@microsoftversion>0x07320000
 insert into repl_errors (database_name, table_name, pkey_col, pkey_val) 
 select dbo.fn_GetDB(APP_NAME()), 'Child', 'ChildID', @pkc1
end

8kb 8kb 2,6392 gold badges32 silver badges36 bronze badges · Answer 1 · 2012-08-02 06:48:45Z

The error is caused by the code in the custom delete stored procedure for each article:

create procedure [dbo].[sp_MSdel_dboChild]
 @pkc1 int
as
begin 
 delete [dbo].[Child]
where [ChildID] = @pkc1
if @@rowcount = 0
 if @@microsoftversion>0x07320000
 exec sp_MSreplraiserror 20598
end

If no row is deleted, a special procedure is called and the transaction is rolled back.

In order to avoid this, I've created a post-Snapshot script that alters the all delete / update replication stored procedures and logs the error to a table instead.

CREATE procedure [dbo].[sp_MSdel_dboChild]
 @pkc1 int
as
begin 
 delete [dbo].[Child]
where [ChildID] = @pkc1 
if @@rowcount = 0
 if @@microsoftversion>0x07320000
 insert into repl_errors (database_name, table_name, pkey_col, pkey_val) 
 select dbo.fn_GetDB(APP_NAME()), 'Child', 'ChildID', @pkc1
end

Stack Exchange Network

How to break out of error loop in transactional replication?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

How to break out of error loop in transactional replication?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions