2025-07-27 | Categories Distributed Systems | Tags MIT 6.824 Distributed Consistency CAP Theorem Raft Algorithm

Demystifying Distributed Consistency: CAP Theorem and Raft Algorithm Explained

1. Everyday Analogy: “Agreement” in Team Collaboration

Imagine a group of friends planning a trip but living in different cities. Messages have delays and some may lose connection, so opinions may not be unified. The consistency challenge in distributed systems is similar: how to get multiple machines to “agree” even under unreliable networks is crucial.

2. Consistency Models and the CAP Theorem

1. Overview of Consistency Models

Model	Description	Everyday Analogy
Strong Consistency	All nodes see the latest data instantly	Everyone receives the updated plan simultaneously
Eventual Consistency	Data eventually syncs but may be temporarily inconsistent	Some get the plan earlier, others later
Weak Consistency	No guarantee of synchronization; states may diverge for long periods	Everyone has different travel plans

2. CAP Theorem Trade-offs

CAP theorem states that a distributed system cannot simultaneously guarantee Consistency (C), Availability (A), and Partition tolerance (P); only two can be achieved at the same time.

CAP Theorem Diagram:

     Consistency (C)
        / \
       /   \
Availability (A) — Partition tolerance (P)

Trade-off	Representative Systems	Use Cases
CA	Single-node databases	Stable network, no partitions
CP	ZooKeeper	Systems needing strong consistency
AP	Dynamo, Cassandra	Highly available, eventually consistent systems

3. Replica Mechanisms and Data Consistency

Replication improves reliability and performance, but keeping replicas consistent is challenging. Common replication methods:

Primary-Backup: Primary handles writes; backups asynchronously sync
Multi-Master: Multiple writable nodes, complex conflict resolution

Consistency guarantees rely on consensus algorithms to synchronize logs and state.

4. Core Consistency Protocol: Raft Algorithm Explained

Raft is known for simplicity and divides nodes into three roles:

Raft Roles:

Leader       Followers        Candidate
  ↑              ↑               ↑
  | ←——— Election Process ———→ |

1. Leader Election

All nodes start as Followers
After election timeout, a node becomes Candidate and requests votes
Gains majority votes to become Leader

2. Log Replication

Leader receives client commands and appends them to its log
Concurrently replicates logs to Followers
Commits logs after majority acknowledgement, updates state machine

3. Safety and Fault Tolerance

Ensures log consistency and prevents split-brain
Uses term numbers to prevent outdated leaders from committing logs
Handles network partitions and node failures

// Raft log append pseudocode example
func (rf *Raft) AppendEntries(args *AppendEntriesArgs, reply *AppendEntriesReply) {
    rf.mu.Lock()
    defer rf.mu.Unlock()
    if args.Term < rf.currentTerm {
        reply.Success = false
        return
    }
    rf.log = append(rf.log, args.Entries...)
    reply.Success = true
}

5. Practical Tips for Observation and Debugging

Observe leader election and heartbeat via logs
Test fault tolerance by simulating network partitions
Use Go debugging tools like Delve to trace state changes

6. Terminology Mapping Table

Everyday Term	Technical Term	Explanation
Meeting Host	Leader	Coordinates log replication and state updates
Attendees	Follower	Receives leader commands and stays in sync
Candidate	Candidate	Initiates election to become leader
Voting	Vote	Mechanism for electing leader

7. Thought Chain and Exercises

How does the CAP theorem guide real-world system design?
How does Raft prevent split-brain scenarios?
Implement a simplified Raft supporting election and log replication.

8. Conclusion: Protect Distributed Data Consistency with Raft

Distributed consistency is the cornerstone for stable system operation. The CAP theorem helps understand design trade-offs, and the Raft algorithm provides a clear and practical implementation path. Mastering these is a key step toward becoming a distributed systems expert.

< MapReduce in Practice An Introduction to Distributed Big Data Processing Practical Raft A Deep Dive into Distributed Replicated Log Systems >