Patterns of Communication between Nodes
1Single-Socket Channel
1.1Problem
1.1.1When using Leader and Followers, we need to ensure that messages between the leader and each follower are kept in order, with a retry mechanism for any lost messages.
1.2Solution
1.2.1ensuring all communication between a follower and its leader goes through a single-socket channel
1.2.2Once a connection to a node is open, the node never closes it and continuously reads it for new requests
1.2.3Whenever a node establishes a communication, it opens a single-socket connection that’s used for all requests with the other party
1.2.4It’s important to keep a timeout on the connection so it doesn’t block indefinitely in case of errors.
1.2.5use HeartBeat to periodically send requests over the socket channel to keep it alive.
2Request Batch
2.1Problem
2.1.1If a lot of requests are sent to cluster nodes with a small amount of data, network latency and the request processing time (including serialization and deserialization of the request on the server side) can add significant overhead.
2.2Solution
2.2.1Combine multiple requests together into a single request batch. The batch of the request will be sent to the cluster node for processing, with each request processed in exactly the same manner as an individual request
2.2.2The time at which the request is enqueued is tracked and is later used to decide if the request can be sent as part of the batch.
2.2.3Each request will be assigned a unique request number used to map the response and complete the request.
2.2.4two checks
Check if enough requests have accumulated to fill the batch to the maximum configured size.
configure a small amount of wait time and check if the request has been added more than the wait time ago
2.2.5Technical Considerations
The batch size should be chosen based on the size of individual messages and available network bandwidth, as well as the observed latency and throughput improvements under real-life load.
For example,Apache Kafka has a default batch size of 16Kb. It also has a configuration parameter called linger.ms with the default value of 0. However, if the messages are bigger,a higher batch size might work better.
3Request Pipeline
3.1Problem
3.1.1To achieve better throughput and latency, the request queue on the server should be filled enough to make sure the server capacity is fully utilized
3.2Solution
3.2.1This is achieved by creating two separate threads, one for sending requests over a network channel and the other for receiving responses from the network channel
3.2.2two issues
If requests are continuously sent without waiting for the response, the node accepting the request can be overwhelmed.
Servers need some mechanism to make sure out-of-order requests are rejected.
The first request failed and retried. The server might have processed the second request before the retried first request reaches the server.
For example, Raft always sends the previous log index that is expected
with every log entry. If the previous log index does not match, the server rejects
the request