Subscribe for more posts like this →

Consistent Hashing Only Tells You Where Data Lives

Consistent hashing is one of those techniques that gets introduced as a solution and rarely examined as a design decision. You learn it, you understand why it's better than modulo hashing, and it goes into your mental toolkit as "the right way to partition data."


What the Ring Is Actually Doing

To understand why consistent hashing fails in a specific and predictable way, you need to understand what it's optimising for.

With simple modulo hashing, you assign a key to a node by computing hash(key) % N. This works fine until N changes. Add a node and N becomes N+1 — now almost every key maps to a different node. A cluster growing from 10 to 11 nodes remaps roughly 90% of its data. That data has to move across the network, potentially all at once, while the cluster is serving traffic. The operational cost of this is severe enough that in practice, scaling becomes something you dread and delay.

Consistent hashing solves this by changing the question. Instead of asking "which of N nodes owns this key?", it asks "which node is closest to this key on a fixed address space?" The address space — the ring — doesn't change when nodes are added or removed. A new node claims one arc of the ring. Only the keys in that arc move — roughly 1/N of the total, not (N-1)/N. A departing node's arc transfers to its neighbour. The rest of the ring is undisturbed.

This is a genuine and important improvement. It makes cluster membership changes cheap and localised. It's why consistent hashing appears in virtually every distributed storage system built in the last two decades.

But notice what it optimises for: minimising data movement during rebalancing. That is the only problem it solves.


The Load Problem

Load is not determined by where data lives. Load is determined by how often that data is accessed, and how much work each access generates.

This distinction sounds obvious when stated directly. In practice it gets collapsed constantly, because in a uniformly accessed system the two are equivalent — if every key is accessed equally often, then a node owning 1/N of the keys handles 1/N of the traffic. The ring's balanced placement produces balanced load as a consequence.

Real systems are not uniformly accessed. They never are. Access patterns in production follow distributions that are deeply non-uniform — a small number of keys generate a disproportionate share of traffic. The reasons are structural: social graphs have hubs, content has popularity curves, products have featured items, users have power users. These are not edge cases or temporary spikes. They are the normal shape of real workloads.

When access is non-uniform, the ring's balance breaks completely. A node that owns the arc containing a high-traffic key handles the traffic for that key regardless of how many nodes the cluster has. Adding nodes doesn't help — the new nodes take arcs from elsewhere on the ring, not from the hot arc. The hot node remains hot. The cluster has spare capacity overall and one saturated node.

The ring is balanced by key count. It has no concept of key weight. Load is determined by weight.


Why Virtual Nodes Don't Fix This

The standard response to consistent hashing hotspots is virtual nodes — placing each physical node at multiple positions on the ring rather than one. A node with 150 virtual positions owns 150 small arcs instead of one large arc. The argument is that with enough virtual nodes, the load distributes more evenly because the arcs are smaller and less likely to concentrate high-traffic keys together.

This helps with one specific problem: structural imbalance from poor ring placement. If the hash function maps nodes to positions unevenly — clustering several nodes in one region of the ring and leaving another region sparse — virtual nodes smooth this out. More positions means a more even spread.

What virtual nodes cannot help with is a single key that generates enormous traffic. A key with a million accesses per hour lives at exactly one position on the ring, regardless of how many virtual nodes you have. The node owning that position handles those million accesses. You cannot split a single key's position across multiple nodes without changing what the key means — without moving the problem from the infrastructure layer into the data model.

Virtual nodes address imbalance in the distribution of keys. They do not address imbalance in the distribution of work per key. These are different problems and the same solution does not apply to both.


What the Ring Cannot See

The root cause is that consistent hashing, like all position-based partitioning schemes, is designed without access pattern information. When you lay out the ring, you don't know which keys will be hot. You make a placement decision based on key identity alone and trust that the access distribution will be uniform enough that placement and load remain correlated.

In a new system, this is unavoidable. You don't have traffic data yet. You make a reasonable assumption, ship, and observe. The problem is what happens after you observe.

By the time a hotspot is visible — one node at 90% CPU while others idle at 20%, one shard's read latency climbing while others are fine — the partitioning scheme is load-bearing. Data is distributed across the ring. Services are routing to nodes based on ring position. Clients may be doing their own ring lookups. Changing the partitioning scheme at this point means moving data, updating routing, coordinating clients — an expensive migration under the pressure of an ongoing performance problem.

The decision that causes the problem is made before the information needed to make it correctly exists, and correcting it becomes expensive at exactly the moment when the cost of the problem is highest.


The Two Problems That Actually Need Solving

Recognising that placement and load are independent lets you see that there are two separate problems requiring two separate mechanisms.

The placement problem is what consistent hashing solves: given a key, which node owns it, and how does that assignment change minimally as the cluster evolves? The ring is a good answer to this question.

The load problem is different: given that access is non-uniform, how does the system respond when one partition is receiving more work than it can handle? This requires a mechanism that the ring doesn't provide — the ability to split a hot partition, move it, or serve it from multiple nodes simultaneously.


Conclusion

The analysis missing from most partitioning decisions is simple to state and hard to answer: what does the access distribution across your keyspace actually look like?

  • Which keys will be accessed most frequently?
  • Are the hot keys identifiable in advance, or do they emerge from user behaviour?
  • Is the hotspot pattern stable — the same keys, always — or does it shift as trending content changes, as featured products rotate, as power users vary their behaviour?

These questions determine whether a position-based scheme is sufficient or whether you need a load-aware scheme from the start. They are almost never asked at design time because the data to answer them doesn't exist yet. They are almost always asked retroactively, under the pressure of an incident, when the partitioning scheme is already entrenched.

The partitioning decision is one of the earliest structural choices in a distributed system and one of the most expensive to reverse. The assumptions it makes about access patterns should be stated explicitly, written down, and revisited as real traffic data becomes available. A scheme that's correct for uniform access and wrong for your actual access pattern is not a tuning problem — it's a design problem wearing the mask of an operations problem.


What does the access distribution across your partitions actually look like right now — and is your partitioning scheme making an assumption about uniformity that your traffic is quietly violating?

Read more