当前现状
复制集架构模式:一主多从(分布在两个机房)
A机房原来一主两从架构,因A机房要下线,需要迁移到B机房,拷贝数据到B机房之后,临时将B机房节点作为特殊的从节点(hidden)加入到A机房中.
因A机房突发下线,节点均不可用,只剩下B机房节点,但B机房节点是特殊从节点,没来得及切换主从,导致无法提供rw服务,需进行处理!
复制集状态查看
monitors:SECONDARY> rs.status()
{
"set" : "monitors",
"date" : ISODate("2023-05-31T16:12:09.737Z"),
"myState" : 2,
"members" : [
{
"_id" : 0,
"name" : "10.13.30.68:2709",
"health" : 0,
"state" : 8,
"stateStr" : "(not reachable/healthy)", #不可用状态
"uptime" : 0,
"optime" : Timestamp(0, 0),
"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
"lastHeartbeat" : ISODate("2023-05-31T16:12:09.723Z"),
"lastHeartbeatRecv" : ISODate("2023-05-31T14:40:07.008Z"),
"pingMs" : 3,
"lastHeartbeatMessage" : "Failed attempt to connect to 10.13.30.68:2709; couldn't connect to server 10.13.30.68:2709 (10.13.30.68), connection attempt failed",
"configVersion" : -1
},
{
"_id" : 1,
"name" : "10.13.30.69:2709",
"health" : 0,
"state" : 8,
"stateStr" : "(not reachable/healthy)",
"uptime" : 0,
"optime" : Timestamp(0, 0),
"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
"lastHeartbeat" : ISODate("2023-05-31T16:12:01.942Z"),
"lastHeartbeatRecv" : ISODate("2023-05-31T14:40:15.852Z"),
"pingMs" : 3,
"lastHeartbeatMessage" : "Failed attempt to connect to 10.13.30.69:2709; couldn't connect to server 10.13.30.69:2709 (10.13.30.69), connection attempt failed",
"configVersion" : -1
},
{
"_id" : 2,
"name" : "10.13.54.41:2709",
"health" : 0,
"state" : 8,
"stateStr" : "(not reachable/healthy)",
"uptime" : 0,
"optime" : Timestamp(0, 0),
"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
"lastHeartbeat" : ISODate("2023-05-31T16:12:08.456Z"),
"lastHeartbeatRecv" : ISODate("2023-05-31T15:07:51.556Z"),
"pingMs" : 3,
"lastHeartbeatMessage" : "Failed attempt to connect to 10.13.54.41:2709; couldn't connect to server 10.13.54.41:2709 (10.13.54.41), connection attempt failed",
"configVersion" : -1
},
{
"_id" : 3,
"name" : "10.55.22.43:2709",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 1756572,
"optime" : Timestamp(1683793106, 1),
"optimeDate" : ISODate("2023-05-11T08:18:26Z"),
"configVersion" : 95114,
"self" : true
},
{
"_id" : 4,
"name" : "10.55.36.22:2709",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 699,
"optime" : Timestamp(1683793106, 1),
"optimeDate" : ISODate("2023-05-11T08:18:26Z"),
"lastHeartbeat" : ISODate("2023-05-31T16:12:08.556Z"),
"lastHeartbeatRecv" : ISODate("2023-05-31T16:12:09.423Z"),
"pingMs" : 0,
"configVersion" : 95114
},
{
"_id" : 5,
"name" : "10.55.22.35:2709",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 1756359,
"optime" : Timestamp(1683793106, 1),
"optimeDate" : ISODate("2023-05-11T08:18:26Z"),
"lastHeartbeat" : ISODate("2023-05-31T16:12:09.332Z"),
"lastHeartbeatRecv" : ISODate("2023-05-31T16:12:09.403Z"),
"pingMs" : 0,
"configVersion" : 95114
}
],
"ok" : 1
}
复制集配置查看
monitors:SECONDARY> rs.conf()
{
"_id" : "monitors",
"version" : 95114,
"members" : [
{
"_id" : 0,
"host" : "10.13.30.68:2709",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 3,
"tags" : {
},
"slaveDelay" : 0,
"votes" : 1
},
{
"_id" : 1,
"host" : "10.13.30.69:2709",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 2,
"tags" : {
},
"slaveDelay" : 0,
"votes" : 1
},
{
"_id" : 2,
"host" : "10.13.54.41:2709",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 1,
"tags" : {
},
"slaveDelay" : 0,
"votes" : 1
},
{
"_id" : 3,
"host" : "10.55.22.43:2709",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : true,
"priority" : 3,
"tags" : {
},
"slaveDelay" : 0,
"votes" : 0
},
{
"_id" : 4,
"host" : "10.55.36.22:2709",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : true,
"priority" : 0,
"tags" : {
},
"slaveDelay" : 0,
"votes" : 0
},
{
"_id" : 5,
"host" : "10.55.22.35:2709",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : true,
"priority" : 0,
"tags" : {
},
"slaveDelay" : 0,
"votes" : 0
}
],
"settings" : {
"chainingAllowed" : true,
"heartbeatTimeoutSecs" : 10,
"getLastErrorModes" : {
},
"getLastErrorDefaults" : {
"w" : 1,
"wtimeout" : 0
}
}
}
手动切换选主方案
- 将B机房hidden节点设置为false
- 设置权重比例,其中一台要高于其他节点
- votes配置
- 剔除A机房节点
- 应用配置观察选主
涉及操作命令如下:
cfg = rs.conf()
for (i = 0; i < cfg.members.length; i++) {
cfg.members[i].hidden = false
}
rs.reconfig(cfg) #此命令在从节点行不通
rs.reconfig(cfg, {force: true}) #强制应用
cfg.members[3].priority = 3
cfg.members[4].priority = 2
cfg.members[5].priority = 1
votes此处省略。。。
rs.reconfig(cfg, {force: true})
cfg = rs.conf()
cfg.members.splice(0, 3) #删除A机房下线的节点
rs.reconfig(cfg, {force: true})
id根据自己实际环境进行调整;
此处使用 force 选项强制执行 rs.reconfig() 命令,因为当下没有主节点,所以只能在从节点来执行该命令修改副本集的配置。
需要注意的是:使用 force 选项可能会导致数据一致性问题,因为从节点可能没有完全同步主节点的数据。因此,在使用 force 选项时需要谨慎考虑,并确保所有节点的数据已经同步。另外,除非必要,否则不建议在从节点上执行修改配置的操作,以避免潜在的数据一致性问题。
处理后的状态
monitors:PRIMARY> rs.status()
{
"set" : "monitors",
"date" : ISODate("2023-05-31T16:15:34.858Z"),
"myState" : 1,
"members" : [
{
"_id" : 3,
"name" : "10.55.22.43:2709",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 1756777,
"optime" : Timestamp(1685549732, 1),
"optimeDate" : ISODate("2023-05-31T16:15:32Z"),
"electionTime" : Timestamp(1685549706, 1),
"electionDate" : ISODate("2023-05-31T16:15:06Z"),
"configVersion" : 165757,
"self" : true
},
{
"_id" : 4,
"name" : "10.55.36.22:2709",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 904,
"optime" : Timestamp(1683793106, 1),
"optimeDate" : ISODate("2023-05-11T08:18:26Z"),
"lastHeartbeat" : ISODate("2023-05-31T16:15:32.883Z"),
"lastHeartbeatRecv" : ISODate("2023-05-31T16:15:34.455Z"),
"pingMs" : 0,
"configVersion" : 165756
},
{
"_id" : 5,
"name" : "10.55.22.35:2709",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 1756564,
"optime" : Timestamp(1683793106, 1),
"optimeDate" : ISODate("2023-05-11T08:18:26Z"),
"lastHeartbeat" : ISODate("2023-05-31T16:15:32.883Z"),
"lastHeartbeatRecv" : ISODate("2023-05-31T16:15:33.450Z"),
"pingMs" : 0,
"configVersion" : 165756
}
],
"ok" : 1
}