MongoDB Replica Set is a master-slave cluster with automatic failover capabilities, consisting of one Primary node and one or more Secondary nodes. The replica set does not have a fixed primary node; when the primary node fails, the entire cluster will elect a new primary node to provide services to ensure high availability of the system.
The replica set has three types of nodes: Primary, Secondary, and Arbiter
- Primary is responsible for handling client requests, reading and writing data.
- Secondary replicates data from the primary node and can accept read requests. In the event of a primary node failure, it can vote to elect a new primary node.
- Arbiter does not hold a data copy, only participates in the voting process to decide which node becomes the primary.
This article uses mongo 5.0.8 as an example
1. docker-compose.yaml#
Here is the docker-compose.yaml for a single mongo instance.
version: '3.0'
services:
mongo1:
hostname: mongo1
image: mongo:5.0.8
restart: unless-stopped
container_name: mongodb1
command: --replSet app --bind_ip_all --keyFile /data/mongodb/keyFile
environment:
TZ: 'Asia/Shanghai'
MONGO_INITDB_ROOT_USERNAME: 'admin'
MONGO_INITDB_ROOT_PASSWORD: 'password'
ports:
- 27020:27017
volumes:
- ./data1:/data/db
- ./keyFile:/data/mongodb/keyFile
networks:
- bridge_network
mongo2:
hostname: mongo2
image: mongo:5.0.8
restart: unless-stopped
container_name: mongodb2
command: --replSet app --bind_ip_all --keyFile /data/mongodb/keyFile
environment:
TZ: 'Asia/Shanghai'
MONGO_INITDB_ROOT_USERNAME: 'admin'
MONGO_INITDB_ROOT_PASSWORD: 'password'
ports:
- 27018:27017
volumes:
- ./data2:/data/db
- ./keyFile:/data/mongodb/keyFile
networks:
- bridge_network
mongo3:
hostname: mongo3
image: mongo:5.0.8
restart: unless-stopped
container_name: mongodb3
command: --replSet app --bind_ip_all --keyFile /data/mongodb/keyFile
environment:
TZ: 'Asia/Shanghai'
MONGO_INITDB_ROOT_USERNAME: 'admin'
MONGO_INITDB_ROOT_PASSWORD: 'password'
ports:
- 27019:27017
volumes:
- ./data3:/data/db
- ./keyFile:/data/mongodb/keyFile
networks:
- bridge_network
networks:
bridge_network:
driver: bridge
You can choose to start three containers on one machine or on different machines. In this example, the above file is used to start on three machines with IPs 10.0.1.11 (Primary), 10.0.1.31 (Secondary), and 10.0.1.32 (Secondary).
Run docker compose up -d mongo1 on 10.0.1.11
Run docker compose up -d mongo2 on 10.0.1.31
Run docker compose up -d mongo3 on 10.0.1.32
Here, docker compose up -d followed by mongo1, mongo2, mongo3 refers to starting only one service defined in the yaml file. If the yaml file is split into different files and started on the corresponding machines, there is no need to add the service name at the end of the command.
Note:
The keyFile on each machine that makes up the replica must be the same!!! A MongoDB instance can only join the replica set if it has the correct keyfile. The specific operation method for the keyFile can refer to 【mongo transactions】Using docker-compose to start mongo, 'single replica mode' to support transactions
2. After the container starts#
After the containers start, we use the docker exec command to enter the container acting as the primary node and access the mongo shell.
mongo -u admin --authenticationDatabase admin
Initialize the mongo replica set.
rs.initiate({
_id: "app",
members: [
{ _id: 0, host: "10.0.1.11:27020" },
{ _id: 1, host: "10.0.1.31:27018" },
{ _id: 2, host: "10.0.1.32:27019" }
]
})
Note that when I initialized, I mistakenly thought I could directly use the rs.initiate() method without parameters and then sequentially use rs.add('ip') to add the remaining nodes to the replica set. However, this is incorrect because during initialization, if the method without parameters is used, mongo will directly obtain the hostname + port of the host (at this time, for mongo, the host is the docker container) to form the mongo host and write it into the configuration. However, since the mongo host is actually the container, it will obtain the container's hostname and the port of mongo inside the container.
Clearly, this host is invalid when multiple mongo replicas on different machines communicate with each other.
Therefore, it is best to specify the specific machines directly during initialization.
Once the initialization is complete, you can use rs.status() to check the status of the replica set.
Here is a snippet of the content.
"members" : [
{
"_id" : 0,
"name" : "10.0.1.11:27020",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 2299,
"optime" : {
"ts" : Timestamp(1717560117, 1),
"t" : NumberLong(1)
},
"optimeDate" : ISODate("2024-06-05T04:01:57Z"),
"lastAppliedWallTime" : ISODate("2024-06-05T04:01:57.808Z"),
"lastDurableWallTime" : ISODate("2024-06-05T04:01:57.808Z"),
"syncSourceHost" : "",
"syncSourceId" : -1,
"infoMessage" : "",
"electionTime" : Timestamp(1717559887, 1),
"electionDate" : ISODate("2024-06-05T03:58:07Z"),
"configVersion" : 1,
"configTerm" : 1,
"self" : true,
"lastHeartbeatMessage" : ""
},
{
"_id" : 1,
"name" : "10.0.1.31:27018",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 249,
"optime" : {
"ts" : Timestamp(1717560117, 1),
"t" : NumberLong(1)
},
"optimeDurable" : {
"ts" : Timestamp(1717560117, 1),
"t" : NumberLong(1)
},
"optimeDate" : ISODate("2024-06-05T04:01:57Z"),
"optimeDurableDate" : ISODate("2024-06-05T04:01:57Z"),
"lastAppliedWallTime" : ISODate("2024-06-05T04:01:57.808Z"),
"lastDurableWallTime" : ISODate("2024-06-05T04:01:57.808Z"),
"lastHeartbeat" : ISODate("2024-06-05T04:02:05.988Z"),
"lastHeartbeatRecv" : ISODate("2024-06-05T04:02:05.581Z"),
"pingMs" : NumberLong(1),
"lastHeartbeatMessage" : "",
"syncSourceHost" : "10.0.1.11:27020",
"syncSourceId" : 0,
"infoMessage" : "",
"configVersion" : 1,
"configTerm" : 1
},
{
"_id" : 2,
"name" : "10.0.1.32:27019",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 249,
"optime" : {
"ts" : Timestamp(1717560117, 1),
"t" : NumberLong(1)
},
"optimeDurable" : {
"ts" : Timestamp(1717560117, 1),
"t" : NumberLong(1)
},
"optimeDate" : ISODate("2024-06-05T04:01:57Z"),
"optimeDurableDate" : ISODate("2024-06-05T04:01:57Z"),
"lastAppliedWallTime" : ISODate("2024-06-05T04:01:57.808Z"),
"lastDurableWallTime" : ISODate("2024-06-05T04:01:57.808Z"),
"lastHeartbeat" : ISODate("2024-06-05T04:02:05.988Z"),
"lastHeartbeatRecv" : ISODate("2024-06-05T04:02:05.629Z"),
"pingMs" : NumberLong(1),
"lastHeartbeatMessage" : "",
"syncSourceHost" : "10.0.1.11:27020",
"syncSourceId" : 0,
"infoMessage" : "",
"configVersion" : 1,
"configTerm" : 1
}
],
Here you can see that the machine executing the initialization is the primary node, while the others are secondary nodes.
Regarding the meanings of the stateStr field:
STARTUP: The member is starting up.
PRIMARY: The member is the primary node.
SECONDARY: The member is a secondary node.
RECOVERING: The member is recovering.
ARBITER: The member is an arbiter node.
DOWN: The member cannot communicate with the majority of nodes.
UNKNOWN: The member's status is unknown.
ROLLBACK: The member is performing a rollback operation.
REMOVED: The member has been removed from the replica set.
Finally, if there are new nodes that need to be added later, you can use rs.add('ip') to add them. If you need to add an arbiter node (not recommended), you can use rs.add('ip', true) to add it.
About the number of nodes
MongoDB officially states that when there are more than two nodes, it is recommended that the replica set members be an odd number of members, rather than using arbiter nodes. Therefore, in practical applications, try to ensure an odd number of nodes without using arbiter nodes. If it is absolutely necessary for the total number of primary and secondary nodes to be even, then an arbiter node can be added to the replica set (arbiter nodes do not participate in storage or other services, only in voting, so the performance requirements for arbiter nodes are very low, and adding an arbiter node is not very difficult), thus ensuring that the total number of nodes with voting rights is odd. The specific reason for needing an odd number of nodes needs to be explained by the voting election algorithm and actual deployment situation (split-brain problem).