mysql 5.7 重建slave_MySQL5.7(5.6)GTID环境下恢复从库思路方法(转发)

本文介绍了在MySQL 5.7环境中,如何使用Global Transaction Identifier (GTID)来恢复从库。首先,解释了GTID_EXECUTED和GTID_PURGED的概念,然后通过mysqldump创建备份,设置GTID_PURGED值。接着,创建新从库,导入备份,并建立主从关系。当从库因主库清理binlog而丢失部分事务时,提出了两种恢复策略:利用其他从库同步或重建从库。整个过程详细描述了GTID在主从恢复中的关键作用。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

要讨论如何恢复从库,我们得先来了解如下一些概念:

GTID_EXECUTED:它是一组包含已经记录在二进制日志文件中的事务集合

GTID_PURGED:它是一组包含已经从二进制日志删除掉的事务集合。

在继续讨论时,我们先来看下如何新建一个基于GTID的slave。

通过了解上面的两个参数,我们现在只需要:

1.从主库上做一个备份时记录备份时gtid_executed的值。

2.在新的slave上恢复此备份时设置从库的gtid_purged的值为备份时master上gtid_executed的值。

通过mysqldump可以完成我们需要的功能。

目前主库上的状态(3301):

48304ba5e6f9fe08f3fa1abda7d326ab.png

[zejin] 3301>show global variables like 'gtid_executed';

+---------------+-------------------------------------------+

| Variable_name | Value |

+---------------+-------------------------------------------+

| gtid_executed | a97983fc-5a29-11e6-9d28-000c29d4dc3f:1-15 |

+---------------+-------------------------------------------+

1 row in set (0.00 sec)

[zejin] 3301>show global variables like 'gtid_purged';

+---------------+-------------------------------------------+

| Variable_name | Value |

+---------------+-------------------------------------------+

| gtid_purged | a97983fc-5a29-11e6-9d28-000c29d4dc3f:1-13 |

+---------------+-------------------------------------------+

1 row in set (0.00 sec)

48304ba5e6f9fe08f3fa1abda7d326ab.png

step1:用mysqldump做一个全备

mysqldump --all-databases --single-transaction --triggers --routines --events --host=127.0.0.1 --port=3301 --user=root --password=123 > dump3301.sql

打开dump3301.sql我们可以看到如下语句:

SET @@GLOBAL.GTID_PURGED='a97983fc-5a29-11e6-9d28-000c29d4dc3f:1-15';

此值即为master3301上gtid_executed的值。

step2:全新启动一个新的库3303,注意在配置文件中配置enforce_gtid_consistency及gtid_mode=on

48304ba5e6f9fe08f3fa1abda7d326ab.png

mysqld_safe --defaults-file=/home/mysql/my3303.cnf &

此时新库3303上的状态应该是这样的:

[(none)] 3303>show global variables like 'gtid_executed';

+---------------+-------+

| Variable_name | Value |

+---------------+-------+

| gtid_executed | |

+---------------+-------+

1 row in set (0.01 sec)

[(none)] 3303>show global variables like 'gtid_purged';

+---------------+-------+

| Variable_name | Value |

+---------------+-------+

| gtid_purged | |

+---------------+-------+

1 row in set (0.00 sec)

48304ba5e6f9fe08f3fa1abda7d326ab.png

step3:导入备份文件并查看状态值:

48304ba5e6f9fe08f3fa1abda7d326ab.png

mysql -uroot -h127.0.0.1 -p123 -P3303 < dump3301.sql

[(none)] 3303>show global variables like 'gtid_executed';

+---------------+-------------------------------------------+

| Variable_name | Value |

+---------------+-------------------------------------------+

| gtid_executed | a97983fc-5a29-11e6-9d28-000c29d4dc3f:1-15 |

+---------------+-------------------------------------------+

1 row in set (0.02 sec)

[(none)] 3303>show global variables like 'gtid_purged';

+---------------+-------------------------------------------+

| Variable_name | Value |

+---------------+-------------------------------------------+

| gtid_purged | a97983fc-5a29-11e6-9d28-000c29d4dc3f:1-15 |

+---------------+-------------------------------------------+

1 row in set (0.00 sec)

48304ba5e6f9fe08f3fa1abda7d326ab.png

step4:做主从change语句

48304ba5e6f9fe08f3fa1abda7d326ab.png

[zejin] 3303>change master to master_host='192.168.1.240',master_port=3301,master_user='repl',master_password='123',master_auto_position=1;

Query OK, 0 rows affected, 2 warnings (0.01 sec)

[zejin] 3303>start slave;

Query OK, 0 rows affected (0.00 sec)

[zejin] 3303>show slave status\G

*************************** 1. row ***************************

Slave_IO_State: Waiting for master to send event

Master_Host: 192.168.1.240

Master_User: repl

Master_Port: 3301

Connect_Retry: 60

Master_Log_File: binlog57.000014

Read_Master_Log_Pos: 194

Relay_Log_File: zejin240-relay-bin.000002

Relay_Log_Pos: 365

Relay_Master_Log_File: binlog57.000014

Slave_IO_Running: Yes

Slave_SQL_Running: Yes

Replicate_Do_DB:

Replicate_Ignore_DB:

Replicate_Do_Table:

Replicate_Ignore_Table:

Replicate_Wild_Do_Table:

Replicate_Wild_Ignore_Table:

Last_Errno: 0

Last_Error:

Skip_Counter: 0

Exec_Master_Log_Pos: 194

Relay_Log_Space: 575

Until_Condition: None

Until_Log_File:

Until_Log_Pos: 0

Master_SSL_Allowed: No

Master_SSL_CA_File:

Master_SSL_CA_Path:

Master_SSL_Cert:

Master_SSL_Cipher:

Master_SSL_Key:

Seconds_Behind_Master: 0

Master_SSL_Verify_Server_Cert: No

Last_IO_Errno: 0

Last_IO_Error:

Last_SQL_Errno: 0

Last_SQL_Error:

Replicate_Ignore_Server_Ids:

Master_Server_Id: 3301

Master_UUID: a97983fc-5a29-11e6-9d28-000c29d4dc3f

Master_Info_File: /home/mysql/I3303/master.info

SQL_Delay: 0

SQL_Remaining_Delay: NULL

Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates

Master_Retry_Count: 86400

Master_Bind:

Last_IO_Error_Timestamp:

Last_SQL_Error_Timestamp:

Master_SSL_Crl:

Master_SSL_Crlpath:

Retrieved_Gtid_Set:

Executed_Gtid_Set: a97983fc-5a29-11e6-9d28-000c29d4dc3f:1-15

Auto_Position: 1

Replicate_Rewrite_DB:

Channel_Name:

Master_TLS_Version:

1 row in set (0.00 sec)

48304ba5e6f9fe08f3fa1abda7d326ab.png

至此完成了加入一台新的slave的GTID主从环境。

假如我们目前拥有一主带两从的环境:

master(3301)

slave(3302)

slave(3303)

我们来考虑这么一种异常情况,由于种种原因,有可能主库上已经purge掉了一些binlog,但从库都还没有接收到(如slave停了一段时间,而master已经把一些binlog给purge掉了。)

主库目前的状态是:

48304ba5e6f9fe08f3fa1abda7d326ab.png

[zejin] 3301>show global variables like 'gtid_executed';

+---------------+-------------------------------------------+

| Variable_name | Value |

+---------------+-------------------------------------------+

| gtid_executed | a97983fc-5a29-11e6-9d28-000c29d4dc3f:1-21 |

+---------------+-------------------------------------------+

1 row in set (0.00 sec)

[zejin] 3301>show global variables like 'gtid_purged';

+---------------+-------------------------------------------+

| Variable_name | Value |

+---------------+-------------------------------------------+

| gtid_purged | a97983fc-5a29-11e6-9d28-000c29d4dc3f:1-20 |

+---------------+-------------------------------------------+

1 row in set (0.00 sec)

[zejin] 3301>select * from t_users;

+----+------+

| id | name |

+----+------+

| 1 | chen |

| 2 | ok |

| 3 | li |

+----+------+

3 rows in set (0.00 sec)

48304ba5e6f9fe08f3fa1abda7d326ab.png

在从库3303上,我们可以看到如下错误提示:

Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.'

48304ba5e6f9fe08f3fa1abda7d326ab.png

[zejin] 3303>show slave status\G

*************************** 1. row ***************************

Slave_IO_State:

Master_Host: 192.168.1.240

Master_User: repl

Master_Port: 3301

Connect_Retry: 60

Master_Log_File: binlog57.000014

Read_Master_Log_Pos: 457

Relay_Log_File: zejin240-relay-bin.000003

Relay_Log_Pos: 4

Relay_Master_Log_File: binlog57.000014

Slave_IO_Running: No

Slave_SQL_Running: Yes

Replicate_Do_DB:

Replicate_Ignore_DB:

Replicate_Do_Table:

Replicate_Ignore_Table:

Replicate_Wild_Do_Table:

Replicate_Wild_Ignore_Table:

Last_Errno: 0

Last_Error:

Skip_Counter: 0

Exec_Master_Log_Pos: 457

Relay_Log_Space: 194

Until_Condition: None

Until_Log_File:

Until_Log_Pos: 0

Master_SSL_Allowed: No

Master_SSL_CA_File:

Master_SSL_CA_Path:

Master_SSL_Cert:

Master_SSL_Cipher:

Master_SSL_Key:

Seconds_Behind_Master: NULL

Master_SSL_Verify_Server_Cert: No

Last_IO_Errno: 1236

Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.'

Last_SQL_Errno: 0

Last_SQL_Error:

Replicate_Ignore_Server_Ids:

Master_Server_Id: 3301

Master_UUID: a97983fc-5a29-11e6-9d28-000c29d4dc3f

Master_Info_File: /home/mysql/I3303/master.info

SQL_Delay: 0

SQL_Remaining_Delay: NULL

Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates

Master_Retry_Count: 86400

Master_Bind:

Last_IO_Error_Timestamp: 160809 17:25:39

Last_SQL_Error_Timestamp:

Master_SSL_Crl:

Master_SSL_Crlpath:

Retrieved_Gtid_Set: a97983fc-5a29-11e6-9d28-000c29d4dc3f:16

Executed_Gtid_Set: a97983fc-5a29-11e6-9d28-000c29d4dc3f:1-16

Auto_Position: 1

Replicate_Rewrite_DB:

Channel_Name:

Master_TLS_Version:

1 row in set (0.00 sec)

[zejin] 3303>select * from t_users;

+----+------+

| id | name |

+----+------+

| 1 | li |

| 2 | zhou |

+----+------+

2 rows in set (0.00 sec)

48304ba5e6f9fe08f3fa1abda7d326ab.png

主从已经中断,数据也已不一致。

接下来我们来看如何恢复:

由于GTID具有全局唯一性,那么其它正常的gtid已经被复制到了slave3302上,所以我们可以把3303指向3302,同步完毕后再指回master3301(此前提基于3302的binlog还没被purge掉,即存在3303没有从master3301接收到的GTID事务)

操作方法如下:

48304ba5e6f9fe08f3fa1abda7d326ab.png

[zejin] 3303>change master to master_host='192.168.1.240',master_port=3302,master_user='repl',master_password='123',master_auto_position=1;

[zejin] 3303>start slave;

Query OK, 0 rows affected (0.03 sec)

[zejin] 3303>show slave status\G

*************************** 1. row ***************************

Slave_IO_State: Waiting for master to send event

Master_Host: 192.168.1.240

Master_User: repl

Master_Port: 3302

Connect_Retry: 60

Master_Log_File: binlog57.000007

Read_Master_Log_Pos: 1723

Relay_Log_File: zejin240-relay-bin.000002

Relay_Log_Pos: 1687

Relay_Master_Log_File: binlog57.000007

Slave_IO_Running: Yes

Slave_SQL_Running: Yes

Replicate_Do_DB:

Replicate_Ignore_DB:

Replicate_Do_Table:

Replicate_Ignore_Table:

Replicate_Wild_Do_Table:

Replicate_Wild_Ignore_Table:

Last_Errno: 0

Last_Error:

Skip_Counter: 0

Exec_Master_Log_Pos: 1723

Relay_Log_Space: 1937

Until_Condition: None

Until_Log_File:

Until_Log_Pos: 0

Master_SSL_Allowed: No

Master_SSL_CA_File:

Master_SSL_CA_Path:

Master_SSL_Cert:

Master_SSL_Cipher:

Master_SSL_Key:

Seconds_Behind_Master: 0

Master_SSL_Verify_Server_Cert: No

Last_IO_Errno: 0

Last_IO_Error:

Last_SQL_Errno: 0

Last_SQL_Error:

Replicate_Ignore_Server_Ids:

Master_Server_Id: 3302

Master_UUID: 5cee6f9f-5ab8-11e6-a081-000c29d4dc3f

Master_Info_File: /home/mysql/I3303/master.info

SQL_Delay: 0

SQL_Remaining_Delay: NULL

Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates

Master_Retry_Count: 86400

Master_Bind:

Last_IO_Error_Timestamp:

Last_SQL_Error_Timestamp:

Master_SSL_Crl:

Master_SSL_Crlpath:

Retrieved_Gtid_Set: a97983fc-5a29-11e6-9d28-000c29d4dc3f:17-21

Executed_Gtid_Set: a97983fc-5a29-11e6-9d28-000c29d4dc3f:1-21

Auto_Position: 1

Replicate_Rewrite_DB:

Channel_Name:

Master_TLS_Version:

1 row in set (0.00 sec)

[zejin] 3303>select * from t_users;

+----+------+

| id | name |

+----+------+

| 1 | chen |

| 2 | ok |

| 3 | li |

+----+------+

3 rows in set (0.00 sec)

数据也已经完全与主的一致了,复制正常后再change到3301master上。

[zejin] 3303>change master to master_host='192.168.1.240',master_port=3301,master_user='repl',master_password='123',master_auto_position=1;

Query OK, 0 rows affected, 2 warnings (0.01 sec)

[zejin] 3303>start slave;

Query OK, 0 rows affected (0.00 sec)

48304ba5e6f9fe08f3fa1abda7d326ab.png

上面这种情况是基于还有另一个从库已经接收到了master的所有binlog的情况下,那如果结果只是M-S,也发生了如上的问题,那又该如何恢复,我们有如下两种方法:

目前Master上状态为:

48304ba5e6f9fe08f3fa1abda7d326ab.png

[zejin] 3301>show global variables like '%gtid%';

+----------------------------------+-------------------------------------------+

| Variable_name | Value |

+----------------------------------+-------------------------------------------+

| gtid_executed | a97983fc-5a29-11e6-9d28-000c29d4dc3f:1-27 |

……

| gtid_purged | a97983fc-5a29-11e6-9d28-000c29d4dc3f:1-25 |

……

+----------------------------------+-------------------------------------------+

8 rows in set (0.00 sec)

48304ba5e6f9fe08f3fa1abda7d326ab.png

Slave上状态为:

48304ba5e6f9fe08f3fa1abda7d326ab.png

[zejin] 3303>show slave status \G

*************************** 1. row ***************************

Slave_IO_State:

Master_Host: 192.168.1.240

Master_User: repl

Master_Port: 3301

Connect_Retry: 60

Master_Log_File: binlog57.000016

Read_Master_Log_Pos: 729

Relay_Log_File: zejin240-relay-bin.000003

Relay_Log_Pos: 4

Relay_Master_Log_File: binlog57.000016

Slave_IO_Running: No

Slave_SQL_Running: Yes

Replicate_Do_DB:

Replicate_Ignore_DB:

Replicate_Do_Table:

Replicate_Ignore_Table:

Replicate_Wild_Do_Table:

Replicate_Wild_Ignore_Table:

Last_Errno: 0

Last_Error:

Skip_Counter: 0

Exec_Master_Log_Pos: 729

Relay_Log_Space: 194

Until_Condition: None

Until_Log_File:

Until_Log_Pos: 0

Master_SSL_Allowed: No

Master_SSL_CA_File:

Master_SSL_CA_Path:

Master_SSL_Cert:

Master_SSL_Cipher:

Master_SSL_Key:

Seconds_Behind_Master: NULL

Master_SSL_Verify_Server_Cert: No

Last_IO_Errno: 1236

Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.'

Last_SQL_Errno: 0

Last_SQL_Error:

Replicate_Ignore_Server_Ids:

Master_Server_Id: 3301

Master_UUID: a97983fc-5a29-11e6-9d28-000c29d4dc3f

Master_Info_File: /home/mysql/I3303/master.info

SQL_Delay: 0

SQL_Remaining_Delay: NULL

Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates

Master_Retry_Count: 86400

Master_Bind:

Last_IO_Error_Timestamp: 160809 17:54:42

Last_SQL_Error_Timestamp:

Master_SSL_Crl:

Master_SSL_Crlpath:

Retrieved_Gtid_Set: a97983fc-5a29-11e6-9d28-000c29d4dc3f:22

Executed_Gtid_Set: a97983fc-5a29-11e6-9d28-000c29d4dc3f:1-22

Auto_Position: 1

Replicate_Rewrite_DB:

Channel_Name:

Master_TLS_Version:

1 row in set (0.00 sec)

48304ba5e6f9fe08f3fa1abda7d326ab.png

和之前同样类型的错误,我们恢复的思路为:

把slave上的gtid_purged设置为master还没有被purge掉的值,最后借助第三方一致性同步工具来做数据的一致性同步。

我们需要先在slave上做一下reset master来清除gtid的一些信息,直接设置会报如下错误:

[zejin] 3303>set global GTID_PURGED="a97983fc-5a29-11e6-9d28-000c29d4dc3f:1-26";

ERROR 1840 (HY000): @@GLOBAL.GTID_PURGED can only be set when @@GLOBAL.GTID_EXECUTED is empty.

正确操作步骤如下(在slave上执行):

48304ba5e6f9fe08f3fa1abda7d326ab.png

[zejin] 3303>reset master;

Query OK, 0 rows affected (0.02 sec)

[zejin] 3303>set global GTID_PURGED="a97983fc-5a29-11e6-9d28-000c29d4dc3f:1-26";

Query OK, 0 rows affected (0.00 sec)

[zejin] 3303>start slave;

Query OK, 0 rows affected (0.00 sec)

[zejin] 3303>show slave status \G

*************************** 1. row ***************************

Slave_IO_State: Waiting for master to send event

Master_Host: 192.168.1.240

Master_User: repl

Master_Port: 3301

Connect_Retry: 60

Master_Log_File: binlog57.000018

Read_Master_Log_Pos: 728

Relay_Log_File: zejin240-relay-bin.000004

Relay_Log_Pos: 718

Relay_Master_Log_File: binlog57.000018

Slave_IO_Running: Yes

Slave_SQL_Running: Yes

Replicate_Do_DB:

Replicate_Ignore_DB:

Replicate_Do_Table:

Replicate_Ignore_Table:

Replicate_Wild_Do_Table:

Replicate_Wild_Ignore_Table:

Last_Errno: 0

Last_Error:

Skip_Counter: 0

Exec_Master_Log_Pos: 728

Relay_Log_Space: 968

Until_Condition: None

Until_Log_File:

Until_Log_Pos: 0

Master_SSL_Allowed: No

Master_SSL_CA_File:

Master_SSL_CA_Path:

Master_SSL_Cert:

Master_SSL_Cipher:

Master_SSL_Key:

Seconds_Behind_Master: 0

Master_SSL_Verify_Server_Cert: No

Last_IO_Errno: 0

Last_IO_Error:

Last_SQL_Errno: 0

Last_SQL_Error:

Replicate_Ignore_Server_Ids:

Master_Server_Id: 3301

Master_UUID: a97983fc-5a29-11e6-9d28-000c29d4dc3f

Master_Info_File: /home/mysql/I3303/master.info

SQL_Delay: 0

SQL_Remaining_Delay: NULL

Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates

Master_Retry_Count: 86400

Master_Bind:

Last_IO_Error_Timestamp:

Last_SQL_Error_Timestamp:

Master_SSL_Crl:

Master_SSL_Crlpath:

Retrieved_Gtid_Set: a97983fc-5a29-11e6-9d28-000c29d4dc3f:22:27

Executed_Gtid_Set: a97983fc-5a29-11e6-9d28-000c29d4dc3f:1-27

Auto_Position: 1

Replicate_Rewrite_DB:

Channel_Name:

Master_TLS_Version:

1 row in set (0.00 sec)

48304ba5e6f9fe08f3fa1abda7d326ab.png

当然执行完这个之后数据是不一致的,那么此时就可以通过pt-table-checksum和pt-table-sync来做数据的一致性恢复了。

我们还有另一种方法,那就是重建slave,方法如本文最开始的那样新建一个slave,但是在由于目前slave上已经有gtid的一些信息,所以在恢复时得先在slave上reset master,具体操作如下:

在slave上操作:

48304ba5e6f9fe08f3fa1abda7d326ab.png

reset master

source dump3301.sql;

change master to master_host='192.168.1.240',master_port=3301,master_user='repl',master_password='123',master_auto_position=1;

start slave;

show slave status\G

48304ba5e6f9fe08f3fa1abda7d326ab.png

至此完成slave同步异常的恢复。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值