Timeouts Done Right: Deadlines, Not Durations
Most timeout implementations are wrong, because they’re set as fixed durations at each hop, and that model breaks the moment your call chain is more than two services deep.
Let’s walk it step by step, starting with who calls whom:
Call flow:
- A calls B (timeout: 5s)
- B calls C (timeout: 3s)
- C calls D (timeout: 2s)
At the same time:
- A also calls E (timeout: 4s)
- E calls C (timeout: 3s)
So C is shared and receives requests from both B and E.
What A thinks:
A expects everything to complete within ~5 seconds.
That’s the real end-to-end budget.
What actually happens:
- A calls B
- B does some work (say 4 seconds)
- Then B calls C with a 3-second timeout
From B’s perspective, this is fine.
But in reality: Only ~1 second is left in A’s total budget
Now C starts working:
C doesn’t know:
- how much time A has left
- how long B already spent
So it assumes: “I have 3 seconds” (based on B’s timeout)
C calls D (2s timeout), waits, gets a result, returns to B.
But upstream:
- A’s 5 seconds are already up
- A has timed out and returned an error
So: B’s response is now useless
C and D did work for nothing
Now add the second path (A → E → C):
E may call C earlier or later with its own 3s timeout.
So C now sees:
- two requests
- both claiming “you have ~3 seconds”
But in reality:
- one might have 2 seconds left
- the other might have 200ms left
C cannot tell the difference
The core problem:
Each service:
- sets its own timeout
- based on its local view
No one knows: how much time is left in the original request
Why this breaks systems:
- Work continues even after the user has timed out
- Resources are wasted on “dead” requests
- Latency increases across the system
- Retries make it worse
One-line takeaway:
“When each service sets its own timeout, the system loses track of time.
Deadline propagation fixes this. Instead of each service setting its own timeout, the originating service sets an absolute deadline — “this request must complete by 14:30:05.200” — and passes it downstream in metadata (gRPC does this natively via its deadline field, HTTP services pass it as a header). Every service in the chain checks the remaining time before doing work. If 200ms remains and the next call typically takes 500ms, skip it and degrade. No service wastes time on work that the caller has already abandoned.
This changes timeout engineering fundamentally. Instead of “how long should I wait for my downstream?” the question becomes “how much of the caller’s remaining budget should I spend here?” Services become aware of their position in the call chain without being coupled to it.
Adaptive timeouts complement this. A fixed 200ms timeout on a service that normally responds in 10ms means you’ll wait 20x its normal latency before giving up. A fixed 200ms timeout on a service that normally responds in 180ms gives you almost no headroom. Neither is right. Adaptive timeouts track the service’s actual latency distribution and set the cutoff relative to observed behavior — typically at P99 or P99.9 of recent responses. If the service speeds up, the timeout tightens. If it slows, the timeout relaxes within bounds.
Timeout budgets are the server-side complement. A service allocates a total time budget per request and tracks how much has been consumed across sequential downstream calls. If it calls three services in sequence and the first two consume 80% of the budget, the third gets the remaining 20% — or gets skipped entirely in favor of a cached fallback. The budget is explicit, tracked, and enforced.
The pattern: deadlines flow top-down (caller sets the absolute bound), adaptive timeouts adjust per-hop (each service tunes to its downstream’s actual behavior), and budgets track consumption within a single service’s processing. Together they replace guesswork with engineering.
A 100ms timeout that triggers because the deadline is genuinely exhausted is a well-functioning system. A 100ms timeout that triggers because someone picked a round number three years ago is a landmine.