4

I was running some simple tests on a publisher and subscriber on the same instance:

  1. I inserted 50,000 rows into a published article and the data was correctly pushed to the subscriber.
  2. I inadvertently deleted the 49,985th record on the subscriber (but didn't realize it at the time)
  3. I deleted 50,000 rows from the published article

As I was watching the size of the subscriber table, I noticed it would drop down in row count almost to zero and then go back to 50,000. Deletes would start running again, the row count would drop, and then it back to 50,000. This kept happening over and over.

I ran a standard trace and saw all the deletes running normally. After it was almost complete, this statement appeared:

IF @@TRANCOUNT > 0 ROLLBACK 

Then I modified the trace to incude error messages and saw this:

The row was not found at the Subscriber when applying the replicated command.

So it looks like when the distribution agent is executing all the deletes one-by-one, the process is still included in a transaction (the original DELETE was one line). If there is an error, it performs a ROLLBACK and then starts over.

My questions:

  1. Is there a standard way to break out of the loop?
  2. Would it be acceptable to recreate the missing row so the process can complete?
  3. What do you do if this happens on production? I guess that's not really a question. But this seems like a simple error could have some pretty serious complications unless replication is monitored very carefully.
asked Aug 1, 2012 at 5:04
3
  • If you have the missing row, that seems like the easiest way out in this case. Commented Aug 2, 2012 at 1:01
  • I agree. For the future, I've modified the replication procedures so the error is logged. But the agent's attempt to "keep trying" seems like the equivalent of a service broker poison message. I don't see what is gained from not stopping the service. Commented Aug 2, 2012 at 6:52
  • Transactions not applied at subscribers can be removed from the distribution database to stop it from retrying (because then there's nothing to retry), but you would have to manually propagate the changes to the subscribers... in this case that's probably not advisable vs. the one-row fix. Commented Aug 2, 2012 at 13:19

1 Answer 1

2

The error is caused by the code in the custom delete stored procedure for each article:

create procedure [dbo].[sp_MSdel_dboChild]
 @pkc1 int
as
begin 
 delete [dbo].[Child]
where [ChildID] = @pkc1
if @@rowcount = 0
 if @@microsoftversion>0x07320000
 exec sp_MSreplraiserror 20598
end 

If no row is deleted, a special procedure is called and the transaction is rolled back.

In order to avoid this, I've created a post-Snapshot script that alters the all delete / update replication stored procedures and logs the error to a table instead.

CREATE procedure [dbo].[sp_MSdel_dboChild]
 @pkc1 int
as
begin 
 delete [dbo].[Child]
where [ChildID] = @pkc1 
if @@rowcount = 0
 if @@microsoftversion>0x07320000
 insert into repl_errors (database_name, table_name, pkey_col, pkey_val) 
 select dbo.fn_GetDB(APP_NAME()), 'Child', 'ChildID', @pkc1
end 
answered Aug 2, 2012 at 6:48
0

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.