About SWIM protocol
SWIM (Scalable, Weakly-Consistent, Infection-Style, Processes Group Membership Protocol) is a membership protocol, which is used in distributed systems to answer this: who are my peers? That means it should do the failure detection and update the peers to only keep the healthy nodes.
The scaleable in the name implies that it can handle increased size of the system without degrading performance. We build distributed systems in large environments, because scalability is needed. This means thousands of machines could be the in the cluster.
Gossip protocols work like how people gossip in a society, talking to only few people to share information and then those few people talk to others and then the whole society knows about it. That’s hownodes communicate with subset of their total peers to send messages, Infection-Style in the name implies its a gossip protocol.
Weekly consistent means that after some amount of time, all replicas will agree on the same value, where some is undefined amount of time.
Detecting Failure
Tis the period timekis the number of nodes in failure detection group A nodeAsends aPINGmessage to nodeB, if the node replies withACKno further action is needed, but if node does NOT reply before the timeout(less thanT), then it marks the nodeBsuspicious and selects some arbitrarykpeers and asks them to ping nodeBon behalf ofA. This isindirect ping. If none ofknodes receive theACKthen that node is markeddead. This reduces the number of messages sent toO(N)size.
Information dissenminating
JOINEDis sent by a nodePto inform about the network.FAILEDis sent to peers when a node failure is detected by the above process. These messages are sene along with thePING/ACKto the peers and it results in efficient communication by reducing the size of information dissemination toO(log(N)).