We have a Master - Master replication setup, MySQL 5.6. Only one Master is used, the other is for backup and failover (we'll call that the slave). The binlog_format is set to ROW, autoincrement settings are made to avoid conflicts.
The problem: Slave is halted due to Duplicate key errors.
We debugged the cause to be that bulk deletes made by cron jobs on the Master did not run (completely?!?) on slave. We are talking about tens of thousands of records that are NOT deleted from the slave. We did not find errors in the MySQL error log.
That leads to unsyncronized replica and errors when inserts are made using PK that should have been deleted on the Slave.
The tables are MyISAM.
Any idea why the bulk delete doesn't propagate properly on the replica?
3 Answers 3
In the absence of any obvious, logical reason why this might be happening, I'm going to shamelessly invoke the MyISAM boogey-man.
This question reminds me of one from a few years back. It's not a duplicate question, but the underlying mechanisms could be similar. The workaround I provided in that case was specifically based on the premise of duplicate "unique" values existing in a table.
I suspect you have latent, undetected defects in your MyISAM tables, where there are actually -- via an unknown (to me) mechanism -- duplicate primary (or unique) key values hiding below the surface.
When MyISAM selects or deletes a row by primary key, it won't see these duplicates, because as soon as it finds one row, it stops looking, because there "can't" be more to find... yet, if you SELECT *
you'll get the duplicates... and deleting one of the rows would then unmask the other.
One way to test this that might work would be this:
mysql> CREATE TABLE test_table LIKE real_table;
mysql> INSERT INTO test_table SELECT * FROM real_table;
If this is indeed at the root of what's going on, you might get "lucky" and get a duplicate key error, which would prove the theory... because a duplicate key error should be impossible if the original table's data is intact.
You could, of course, review the binary logs using mysqlbinlog
, to confirm that the deletions were logged... but I suspect you will find that the deletes did get logged correctly... but after the rows were deleted, the phantom rows were then visible, and caused the subsequent replication error.
-
It seems that this is not the case as we found tens of thousands of not deleted rows and we could not find any duplicates.Marvin Saldinger– Marvin Saldinger2016年03月30日 06:31:28 +00:00Commented Mar 30, 2016 at 6:31
After some more digging we found the problem.
At some point this week, there was a restart of the main server. Because of this restart some of the inserts/updates/deletes weren't written to the binlog. We don't know why yet.
So, some changes weren't replicated on the secondary server.
Some duplicate key error appeared on slave when inserts were made on master. Also, when the cron jobs (bulk delete) were executed, we got a record not found error. We decided to skip these errors.
When we skipped the error generated by the delete operation, it didn't skip just one record. It skipped all of them. And this is why those tens of thousands of records weren't deleted from slave.
-
1An un-graceful shutdown (eg, power failure) leave many things in questionable state. Look at
sync_binlog
as a likely villain.Rick James– Rick James2016年03月30日 23:55:50 +00:00Commented Mar 30, 2016 at 23:55
To counter such issues in production environment, where consistency of data is priority, if slave servers are being actively used for read purpose by applications, it's a best practice to deploy an automated discrepancy check and sync script (as a daily cron) between master and slave server using pt-table-sync (available with Percona toolkit).
I have tested and deployed such scripts in my production environment which are having huge databases (more than 100 GB), so i don't need to worry in such failure and disaster cases as these script when run will check the discrepancies in number of records between master and slave and will automatically sync the databases and will provide the necessary stats if needed.
PARTITIONing
for bulk delete.