Subscribe for more posts like this →

Load Is Not Where Your Data Lives

When a node is saturated — CPU pegged, latency climbing, queues building — the symptom looks the same regardless of cause. The monitoring shows one overloaded node. It does not tell you why. Underneath that symptom are two fundamentally different problems, and the intervention that fixes one does nothing for the other. Most incident responses never make this distinction, which is why expensive work sometimes changes nothing.


Two Problems That Look Identical

A hot partition is a partition that owns too many high-traffic keys collectively. No single key is the problem — the partition as a whole is handling more load than the node can sustain. This happens when a region of the keyspace becomes collectively more active than the partitioning scheme anticipated. A social platform where new user IDs are assigned sequentially concentrates all new user activity — registrations, first posts, profile views — in whatever partition currently owns the leading edge of the ID range. The keys themselves are not individually unusual. There are just too many of them landing in the same place at the same time.

The fix for a hot partition is splitting: divide the partition's key range into two parts and assign each to a different node. Each new partition handles a subset of the keys and therefore a subset of the load. This works because the load is distributed across many keys — separating the key range separates the work.

A hot key is a single key generating traffic that would saturate a node by itself. One user with a hundred million followers. One product featured on the homepage. One configuration object read by every request in the system. That key maps to one position in the partition scheme. It will always map to that position. Splitting the partition does not help — the key lands on one of the two resulting partitions, and that partition is immediately as hot as the original. You have done the work of a split and redistributed nothing.

The diagnostic question that separates these two cases: is the load concentrated on a small number of keys, or spread across many keys within the partition? If the top ten keys in a partition account for 80% of its traffic, you have a hot key problem. If traffic is broadly distributed across thousands of keys in that partition, you have a hot partition problem. These require different interventions, and choosing the wrong one wastes the time you don't have during an incident.

Most systems cannot answer this diagnostic question from their monitoring. They can tell you a node is hot. They cannot tell you which keys are driving it.


Splitting Partitions: What It Actually Involves

Range-based partitioning makes splitting possible in a way that position-based schemes like the ring do not. Partitions own contiguous key ranges — all keys between A and B belong to this partition — rather than arcs on a ring. When a partition grows too hot, you choose a boundary point within its range, divide it into two partitions, and assign each to a node.

The first non-obvious problem is choosing the boundary. The midpoint of the key range is the naive choice. The correct choice is the point that most evenly distributes the load — which requires knowing the access distribution within the partition, not just the key distribution. A midpoint split of a partition where 80% of the traffic comes from 20% of the key range produces two partitions: one that inherited the hot region, still saturated, and one that inherited the cold region, largely idle. You have done the operational work of a split and left the problem intact.

The second problem is propagation. After splitting, every component that routes requests needs to know about the new partition boundaries: load balancers, client-side routing libraries, proxy layers, any service that caches partition metadata. Routing that hasn't received the update will continue sending requests for keys in the new partition to the old node. In a system with strong consistency requirements, writes arriving at the wrong node after the boundary moves either need to be rejected and retried by the client, or forwarded by the old node to the new one — both of which require explicit coordination logic that most systems don't have until they need it.

A partition split is not a configuration change. It is a distributed protocol: a sequence of steps that must execute in a specific order, across multiple components, without dropping requests or violating consistency guarantees. Getting it right under load, in the middle of an incident, is where the complexity concentrates.


Hot Keys: Where Infrastructure Runs Out of Answers

For hot keys, splitting doesn't help and the infrastructure has no further moves. The key maps to one node. The only way to reduce load on that node is to serve some of the key's traffic from somewhere else — which requires the application to participate explicitly.

Read replicas for specific keys. If the hot key is read-heavy — a celebrity profile, a trending article, a shared configuration value — reads can be served from multiple replicas while writes go to a single primary. Each replica independently handles a share of the read traffic. This works when the read/write ratio is high and readers can tolerate some staleness between a write reaching the primary and propagating to replicas. The application needs to know which keys are routed to replicas and which go to the primary — the routing cannot be hidden in the infrastructure because the infrastructure doesn't know which keys qualify.

Application-level key sharding. Instead of one key for a high-traffic entity, distribute the entity across multiple keys with a shard suffix: entity:{id}:{shard}. Writes go to a shard chosen at write time — either by random selection or a consistent rule. Reads either fan out across all shards and merge the results, or use the same consistent rule to locate the right shard. The hot key becomes N keys, each mapping to a different partition, each receiving 1/N of the traffic. This works for both read-heavy and write-heavy keys, but it means every operation on the entity — every read, every write, every delete — must be shard-aware. The application cannot treat the entity as a single thing anymore.

Caching above the partition layer. For keys that are read far more than written, a cache above the partition absorbs read traffic before it reaches storage. The partition sees only writes and cache misses — a small fraction of total access. This is the least invasive solution: the partition layer doesn't change, and the cache can be added without modifying the data model. It fails for write-heavy keys, where the cache is invalidated so frequently that the hit rate collapses and the partition sees nearly all traffic anyway.

What these three approaches share is that they all require the application to treat hot keys differently from other keys — different routing, different data model, or a different access path. The infrastructure cannot make these decisions autonomously because it cannot see the access pattern. The load problem surfaces in the infrastructure. The fix has to be designed in the application.


The Decision You Make Before You Have the Data

Partitioning schemes are chosen when systems are being designed, before real traffic exists. The access distribution that determines whether the scheme will hold — which keys will be hot, whether hotness is structural or emergent, whether it's concentrated in a few keys or spread across a region — is exactly the information you don't have at design time and only get from production traffic.

By the time a hotspot is visible, the partitioning scheme is load-bearing. Data is distributed according to it. Clients route according to it. Dependent systems make assumptions based on it. Changing the key design of a high-traffic entity at this point means migrating data, updating every reader and writer, and coordinating a cutover while the system is under pressure. Introducing application-level sharding into a system not built for it means changes propagating through every layer that touches that entity.

The cost of asking "which keys are likely to be hot, and what would we do if they were?" at design time is a conversation. The cost of answering that question during an incident, after the scheme is entrenched, is a migration — done under pressure, with production risk at every step.

The question is not whether your system will have hot keys. It will. The question is whether you find out in a design review or in an incident.


Do you know which keys in your system generate disproportionate load — and if the hottest key doubled its traffic tomorrow, which of the interventions above could you execute without a migration?

Read more