2

We are using MySQL 5.7, and have a Master and 3 Slave servers replication setup. All the 4 servers are dedicated for MySQL use only, and belong to the same data-center, connected over Private Network. So I believe that the network issues for replication should be minimal.

So, I did SHOW SLAVE STATUS\G on the Slave 1 server:

mysql> show slave status\G
*************************** 1. row ***************************
 Slave_IO_State: Waiting for master to send event
 Master_Host: <removed for anonymity>
 Master_User: repl
 Master_Port: 3306
 Connect_Retry: 60
 Master_Log_File: mysql-bin.004803
 Read_Master_Log_Pos: 885020002
 Relay_Log_File: ubuntu-s-4vcpu-8gb-blr1-slave02-relay-bin.001056
 Relay_Log_Pos: 885020215
 Relay_Master_Log_File: mysql-bin.004803
 Slave_IO_Running: Yes
 Slave_SQL_Running: Yes
 Replicate_Do_DB: 
 Replicate_Ignore_DB: 
 Replicate_Do_Table: 
 Replicate_Ignore_Table: 
 Replicate_Wild_Do_Table: 
 Replicate_Wild_Ignore_Table: 
 Last_Errno: 0
 Last_Error: 
 Skip_Counter: 0
 Exec_Master_Log_Pos: 885020002
 Relay_Log_Space: 885020495
 Until_Condition: None
 Until_Log_File: 
 Until_Log_Pos: 0
 Master_SSL_Allowed: No
 Master_SSL_CA_File: 
 Master_SSL_CA_Path: 
 Master_SSL_Cert: 
 Master_SSL_Cipher: 
 Master_SSL_Key: 
 Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
 Last_IO_Errno: 0
 Last_IO_Error: 
 Last_SQL_Errno: 0
 Last_SQL_Error: 
 Replicate_Ignore_Server_Ids: 
 Master_Server_Id: 103
 Master_UUID: 78e93502-d3af-11e7-9af0-5aaa686987ef
 Master_Info_File: /var/lib/mysql/master.info
 SQL_Delay: 0
 SQL_Remaining_Delay: NULL
 Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
 Master_Retry_Count: 86400
 Master_Bind: 
 Last_IO_Error_Timestamp: 
 Last_SQL_Error_Timestamp: 
 Master_SSL_Crl: 
 Master_SSL_Crlpath: 
 Retrieved_Gtid_Set: 
 Executed_Gtid_Set: 
 Auto_Position: 0
 Replicate_Rewrite_DB: 
 Channel_Name: 
 Master_TLS_Version: 
1 row in set (0.00 sec)

It clearly states that everything is fine. We also have quite a bit of monitoring tools like Zabbix, Packetbeat (Kibana) setup; which also gives us alerts if there is lag in the replication in any of the three Slave servers. We did not get any alerts about possible replication issues as well.

Now, I ran a query on Slave 1 server:

mysql> select order_id, order_status_id, buyer_invoice_id 
 from oc_suborder where order_id = 62284;
+----------+-----------------+------------------+
| order_id | order_status_id | buyer_invoice_id |
+----------+-----------------+------------------+
| 62284 | 1 | NULL |
| 62284 | 15 | 76729 |
+----------+-----------------+------------------+
2 rows in set (0.01 sec)

It returns two rows in this case, which is wrong.

I ran the same query on the Master Server:

mysql> select order_id, order_status_id, buyer_invoice_id 
 from oc_suborder where order_id = 62284;
+----------+-----------------+------------------+
| order_id | order_status_id | buyer_invoice_id |
+----------+-----------------+------------------+
| 62284 | 15 | 76729 |
+----------+-----------------+------------------+
1 row in set (0.18 sec)

It returns one row only, which is correct; so the Slave is definitely having inconsistent data despite showing Slave Status as OK.

I also performed some DML operations on Master, and those changes did get replicated on the Slave server; so the replication is running. We are using third party DBA services, and every-time this happens; all they would suggest is to rebuild the Slave server(s). We are already sitting at 100 GB+ data, and it takes quite a few hours for this process (they have already done it 4 times in last few months); but the data inconsistency keeps on popping up time and again.

My questions are manifold:

  1. Is there any way to simply identify the data mismatch related tables/rows only, across multiple Schemas (Databases); and fix these specific rows instead of rebuilding the Slave again. My doubt is that I may have multiple tables with this data mismatch. We discover these mismatches only when some specific report contains suspicious data.

  2. What could be the possible reasons of the data mismatch appearing; even when Slave is 0 seconds behind Master. The example table in the queries above is using InnoDB. There are a few Transactions running on that table, and I am starting to suspect that a Transaction gets successfully committed on the Master, but not on the Slave sometimes.

  3. Using SHOW GLOBAL VARIABLES LIKE 'binlog_format';, I found that binlog_format on my Master server is set as MIXED, while on the Slave server it is set as ROW. Can this be source of this error or some other undiscovered errors ?

Please bear with me if the question seems Too Broad (as per Stack Overflow rules). I will be happy to provide as many details as needed, or Edit the question down to specific details. I am regular Stack Overflow user, and posting first time here. Any pointers would be really helpful.

Edit #1:

Are all the slaves configured with read-only=1? Is there any chance that your application could be writing directly to the slaves?

Yes. Also, in our application code, we do have an elaborate code for Read-Write separation. Infact, all the queries inside a Transaction (START TRANSACTION .. COMMIT/ROLLBACK) are always routed to the Master Server, even if there are SELECT queries within this block. Even, to cover for replication lag (sometimes), we route the next 2 SELECT queries after a DML operation to Master only.

Are any of the slaves configured with slave-skip-errors by any chance? If so, what is the value of this parameter?

Please find below:

mysql> select @@slave_skip_errors;
+---------------------+
| @@slave_skip_errors |
+---------------------+
| 1032,1062 |
+---------------------+
1 row in set (0.10 sec)

Edit #2

Please provide SHOW CREATE TABLE oc_suborder for both the Master and the Slave with the problem you detailed.

Master Server:

mysql> show create table oc_suborder\G;
*************************** 1. row ***************************
 Table: oc_suborder
Create Table: CREATE TABLE `oc_suborder` (
 `order_id` int(11) NOT NULL,
 `suborder_id` varchar(20) NOT NULL,
 `gst` tinyint(1) NOT NULL DEFAULT '1',
 `buyer_invoice_id` int(11) unsigned DEFAULT NULL,
 `invoice_no` int(11) NOT NULL DEFAULT '0',
 `invoice_date` datetime DEFAULT NULL,
 `invoice_prefix` varchar(26) NOT NULL,
 `shipping_method` varchar(128) NOT NULL,
 `shipping_code` varchar(128) NOT NULL,
 `total` decimal(15,4) DEFAULT NULL,
 `order_status_id` int(11) NOT NULL,
 `date_added` datetime NOT NULL,
 `date_modified` datetime NOT NULL,
 `cform_submit` enum('no_submit','will_submit','submitted','') NOT NULL DEFAULT 'no_submit',
 `cst_with_cform` decimal(15,4) DEFAULT NULL,
 `refundable_cform` decimal(15,4) DEFAULT NULL,
 `refund_status` enum('refunded','not_refunded','not_applicable','') NOT NULL DEFAULT 'not_applicable',
 `courier_partner` varchar(100) DEFAULT NULL,
 `tracking_no` varchar(64) DEFAULT NULL,
 `shipping_charge` decimal(15,2) DEFAULT NULL,
 `custom_totals` text,
 `no_wsb_tape` tinyint(1) NOT NULL DEFAULT '0',
 `no_invoice_with_shipment` tinyint(1) NOT NULL DEFAULT '0',
 `courier_partner_preference` varchar(255) DEFAULT NULL,
 PRIMARY KEY (`suborder_id`),
 UNIQUE KEY `order_id` (`order_id`,`suborder_id`),
 UNIQUE KEY `buyer_invoice_id` (`buyer_invoice_id`),
 KEY `invoice_date` (`invoice_date`),
 KEY `shipping_method` (`shipping_method`),
 KEY `order_status_id` (`order_status_id`),
 KEY `date_added` (`date_added`),
 KEY `tracking_no` (`tracking_no`),
 CONSTRAINT `oc_suborder_order_id_fk` FOREIGN KEY (`order_id`) REFERENCES `oc_order` (`order_id`) ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.08 sec)

Slave server:

mysql> show create table oc_suborder\G;
*************************** 1. row ***************************
 Table: oc_suborder
Create Table: CREATE TABLE `oc_suborder` (
 `order_id` int(11) NOT NULL,
 `suborder_id` varchar(20) NOT NULL,
 `gst` tinyint(1) NOT NULL DEFAULT '1',
 `buyer_invoice_id` int(11) unsigned DEFAULT NULL,
 `invoice_no` int(11) NOT NULL DEFAULT '0',
 `invoice_date` datetime DEFAULT NULL,
 `invoice_prefix` varchar(26) NOT NULL,
 `shipping_method` varchar(128) NOT NULL,
 `shipping_code` varchar(128) NOT NULL,
 `total` decimal(15,4) DEFAULT NULL,
 `order_status_id` int(11) NOT NULL,
 `date_added` datetime NOT NULL,
 `date_modified` datetime NOT NULL,
 `cform_submit` enum('no_submit','will_submit','submitted','') NOT NULL DEFAULT 'no_submit',
 `cst_with_cform` decimal(15,4) DEFAULT NULL,
 `refundable_cform` decimal(15,4) DEFAULT NULL,
 `refund_status` enum('refunded','not_refunded','not_applicable','') NOT NULL DEFAULT 'not_applicable',
 `courier_partner` varchar(100) DEFAULT NULL,
 `tracking_no` varchar(64) DEFAULT NULL,
 `shipping_charge` decimal(15,2) DEFAULT NULL,
 `custom_totals` text,
 `no_wsb_tape` tinyint(1) NOT NULL DEFAULT '0',
 `no_invoice_with_shipment` tinyint(1) NOT NULL DEFAULT '0',
 `courier_partner_preference` varchar(255) DEFAULT NULL,
 PRIMARY KEY (`suborder_id`),
 UNIQUE KEY `order_id` (`order_id`,`suborder_id`),
 UNIQUE KEY `buyer_invoice_id` (`buyer_invoice_id`),
 KEY `invoice_date` (`invoice_date`),
 KEY `shipping_method` (`shipping_method`),
 KEY `order_status_id` (`order_status_id`),
 KEY `date_added` (`date_added`),
 KEY `tracking_no` (`tracking_no`),
 CONSTRAINT `oc_suborder_order_id_fk` FOREIGN KEY (`order_id`) REFERENCES `oc_order` (`order_id`) ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.07 sec)
asked Nov 29, 2018 at 9:04
9
  • 2
    I think the slave-skip-errors value is fine. 1032 is for KEY_NOT_FOUND. E.g. when an UPDATE on the slave can't find the row(s) it's meant for. 1062 is for ER_DUP_ENTRY, e.g. when an INSERT collides with an already existing primary key value. Commented Nov 29, 2018 at 10:44
  • 1
    I'd suggest checking out this answer by RonaldoMySQLDBA. Commented Nov 29, 2018 at 10:45
  • 1
    As stated in Ronaldo's answer, there could be DML that is unsafe for replication or settings which lead to DML being replicated in the wrong order, both of which can lead to inconsistencies on the slave, which again will lead to such errors. Commented Nov 29, 2018 at 10:51
  • Let us continue this discussion in chat. Commented Nov 29, 2018 at 17:56
  • OK refers to the process of replicating, not to the data integrity. Commented Nov 29, 2018 at 21:55

1 Answer 1

4
  1. You can use pt-table-checksum and pt-table-sync to fix the data mismatch between the master and the slave. Here's a blog post for further reference: https://www.percona.com/blog/2015/08/12/mysql-replication-primer-with-pt-table-checksum-and-pt-table-sync/

  2. It is possible that an update went to the slave directly instead of the master. It can also be with the use of sql_log_bin=0 on the master. This can happen but you can prevent it by setting the slave with read_only=1 or super_read_only=1

Pang
2091 silver badge8 bronze badges
answered Nov 29, 2018 at 9:55
4
  • I checked that binlog_format on my Master server is set as MIXED, while on the Slave server it is set as ROW. Will it be causing any issues ? Commented Nov 29, 2018 at 12:03
  • I tried pt-table-checksum --replicate=percona.checksums --replicate-check-only --ignore-databases mysql h=localhost,u=checksum --recursion-method=processlist --no-check-binlog-format and it did report data mismatch, but in a different table. However it did not report data mismatch for the table which I have specified in my problem statement. Am I using the flags in the pt-table-checksum command correctly ? Commented Nov 29, 2018 at 16:38
  • Afterwards, I tried pt-table-sync and it threw following error: Called not_in_right in state 0 at /usr/bin/pt-table-sync line 5251. while doing... Commented Nov 29, 2018 at 16:39
  • (super_read_only was added in 5.6.23 and 5.7.8.) Commented Nov 30, 2018 at 20:00

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.