Distributed Systems Topics Consistent Hashing and Cache Consistency Explained

| Categories Distributed Systems  | Tags MIT 6.824  Consistent Hashing  Distributed Cache  Cache Consistency  Distributed File Systems 

Distributed Systems Topics: Consistent Hashing and Cache Consistency Explained


1. Everyday Analogy: Parcel Sorting and Pickup Stores

Imagine a courier company handling tens of thousands of parcels—how do they assign them to different sorting centers? And how do pickup stores keep their inventory up to date, avoiding customers receiving “expired” goods? Consistent hashing and cache consistency in distributed systems solve similar challenges of “allocation” and “synchronization.”


2. Consistent Hashing and Data Distribution

1. Concept of Consistent Hashing

Consistent hashing maps both data and nodes onto a virtual ring, storing data on the first node clockwise from the data point. This design minimizes data migration when nodes are dynamically added or removed.

Consistent Hash Ring Illustration:

[Node A]---[Node B]----[Node C]---[Node D]---(Ring Structure)
        ↑                  ↑
       Data X             Data Y

### 2. Advantages

- Smooth and efficient scaling up and down
- Minimizes data movement
- Well-suited for caching systems and distributed storage

---

## 3. Distributed Cache and Cache Consistency

### 1. Introduction to Distributed Cache

Caches hot data to improve system response speed, commonly seen in Memcached and Redis Cluster.

### 2. Challenges of Cache Consistency

- **Cache update latency**: Cache not refreshed promptly after data changes
- **Stale data risk**: Clients read expired cache entries
- **Concurrent update conflicts**

### 3. Common Solutions

| Strategy              | Description                                                     | Suitable Scenario                            |
| --------------------- | --------------------------------------------------------------- | -------------------------------------------- |
| Cache Aside           | Update database first, then delete cache                        | Simple, widely used                          |
| Write Through         | Write synchronously to cache and database                       | Read-heavy, write-light workloads            |
| Write Back            | Delay writing back to database                                  | Write-heavy scenarios for better performance |
| TTL & Version Control | Use expiration time and version numbers to maintain consistency | Avoid stale data and cache avalanches        |

---

## 4. Distributed File Systems (DFS) Overview

### 1. Purpose

Enable massive file sharing across multiple machines, e.g., Google File System (GFS), Hadoop Distributed File System (HDFS).

### 2. Key Design Points

- File chunking and replica management
- Metadata services (NameNode, Zookeeper)
- Fault tolerance and load balancing

---

## 5. Go Language Example: Simple Consistent Hash Algorithm

```go
type HashRing struct {
    nodes []string
}

func (hr *HashRing) GetNode(key string) string {
    h := fnv.New32a()
    h.Write([]byte(key))
    hash := h.Sum32()
    idx := int(hash) % len(hr.nodes)
    return hr.nodes[idx]
}
```

---

## 6. Debugging and Practical Advice

- Use monitoring tools to observe cache hit rates and expiration
- Simulate dynamic node joins/leaves to verify consistent hashing migration efficiency
- Design reasonable cache expiration and update mechanisms for consistency

---

## 7. Terminology Mapping Table

| Everyday Term   | Technical Term     | Description                                  |
| --------------- | ------------------ | -------------------------------------------- |
| Parcel Sorting  | Consistent Hashing | Efficient data allocation algorithm          |
| Pickup Stores   | Distributed Cache  | Multi-node caching system for hot data       |
| Parcel Tracking | Metadata Service   | Component managing file locations and states |

---

## 8. Thought Exercises and Practice

- How does consistent hashing reduce data migration caused by node changes?
- Design cache expiration policies to prevent cache avalanches.
- Implement a simple metadata management module for a distributed file system.

---

## 9. Conclusion: The "Soft Power" Design of Distributed Systems

Consistent hashing, distributed caching, and file systems form the backbone of distributed systems. Mastering these technologies helps build more stable and efficient distributed applications.