Newsletter

Timeouts Done Right: Deadlines, Not Durations

Stackshala

24 Mar 2026 — 3 min read

Most timeout implementations are wrong, because they’re set as fixed durations at each hop, and that model breaks the moment your call chain is more than two services deep.

Let’s walk it step by step, starting with who calls whom:

Call flow:

A calls B (timeout: 5s)
B calls C (timeout: 3s)
C calls D (timeout: 2s)

At the same time:

A also calls E (timeout: 4s)
E calls C (timeout: 3s)

So C is shared and receives requests from both B and E.

What A thinks:

A expects everything to complete within ~5 seconds.

That’s the real end-to-end budget.

What actually happens:

A calls B
B does some work (say 4 seconds)
Then B calls C with a 3-second timeout

From B’s perspective, this is fine.

But in reality: Only ~1 second is left in A’s total budget

Now C starts working:

C doesn’t know:

how much time A has left
how long B already spent

So it assumes: “I have 3 seconds” (based on B’s timeout)

C calls D (2s timeout), waits, gets a result, returns to B.

But upstream:

A’s 5 seconds are already up
A has timed out and returned an error

So: B’s response is now useless
C and D did work for nothing

Now add the second path (A → E → C):

E may call C earlier or later with its own 3s timeout.

So C now sees:

two requests
both claiming “you have ~3 seconds”

But in reality:

one might have 2 seconds left
the other might have 200ms left

C cannot tell the difference

The core problem:

Each service:

sets its own timeout
based on its local view

No one knows: how much time is left in the original request

Why this breaks systems:

Work continues even after the user has timed out
Resources are wasted on “dead” requests
Latency increases across the system
Retries make it worse

One-line takeaway:

“When each service sets its own timeout, the system loses track of time.

Deadline propagation fixes this. Instead of each service setting its own timeout, the originating service sets an absolute deadline — “this request must complete by 14:30:05.200” — and passes it downstream in metadata (gRPC does this natively via its deadline field, HTTP services pass it as a header). Every service in the chain checks the remaining time before doing work. If 200ms remains and the next call typically takes 500ms, skip it and degrade. No service wastes time on work that the caller has already abandoned.

This changes timeout engineering fundamentally. Instead of “how long should I wait for my downstream?” the question becomes “how much of the caller’s remaining budget should I spend here?” Services become aware of their position in the call chain without being coupled to it.

Adaptive timeouts complement this. A fixed 200ms timeout on a service that normally responds in 10ms means you’ll wait 20x its normal latency before giving up. A fixed 200ms timeout on a service that normally responds in 180ms gives you almost no headroom. Neither is right. Adaptive timeouts track the service’s actual latency distribution and set the cutoff relative to observed behavior — typically at P99 or P99.9 of recent responses. If the service speeds up, the timeout tightens. If it slows, the timeout relaxes within bounds.

Timeout budgets are the server-side complement. A service allocates a total time budget per request and tracks how much has been consumed across sequential downstream calls. If it calls three services in sequence and the first two consume 80% of the budget, the third gets the remaining 20% — or gets skipped entirely in favor of a cached fallback. The budget is explicit, tracked, and enforced.

The pattern: deadlines flow top-down (caller sets the absolute bound), adaptive timeouts adjust per-hop (each service tunes to its downstream’s actual behavior), and budgets track consumption within a single service’s processing. Together they replace guesswork with engineering.

A 100ms timeout that triggers because the deadline is genuinely exhausted is a well-functioning system. A 100ms timeout that triggers because someone picked a round number three years ago is a landmine.

From Darkness to Light: Debugging with Leading and Lagging Metrics

Time-Based Partitioning in Distributed Systems: The Hidden Complexity

Every Database Index Is a Bet on Your Read/Write Ratio

Cache Locality Beats Big O - What Asymptotic Notation Can't Tell You

Read more

From Darkness to Light: Debugging with Leading and Lagging Metrics

Time-Based Partitioning in Distributed Systems: The Hidden Complexity

Every Database Index Is a Bet on Your Read/Write Ratio

Cache Locality Beats Big O - What Asymptotic Notation Can't Tell You