Software Design 101: Twitter Timeline

Stackshala

12 Jan 2026 — 23 min read

System Overview

Observer Analysis

Tweet Author (Initiator)

Type: Initiator

Time: How long from posting until they know the tweet exists? Then, distribution latency. How long until followers can see it?

Wrongness tolerance: Cannot tolerate lost tweets—posted but never persisted. Cannot tolerate ambiguity—did it post or not? Can tolerate distribution delay if the tweet is eventually visible. Can tolerate not everyone seeing it immediately.

Pressure contribution: Posting velocity is usually manageable. Pressure comes from high-profile moments—breaking news, live events—when many authors post simultaneously about the same thing.

Timeline Reader (Recipient)

Type: Recipient

Time: Freshness. When I open the app, how recent are the tweets I see? Am I seeing what's happening now, or what happened an hour ago?

Wrongness tolerance: Can tolerate some staleness—minutes-old tweets are fine. Can tolerate missing some tweets—no one expects to see every tweet from every follow. Cannot tolerate gross staleness—seeing yesterday's tweets when news is breaking now. Cannot tolerate incoherence—seeing a reply without the original, seeing tweets badly out of order.

Pressure contribution: Read load is highly variable. Spikes during major events, sports, breaking news. Millions of users refreshing simultaneously.

Follow Graph Service (Recipient)

Type: Recipient of follow/unfollow events, queried by other services

Expects: Accurate follower lists. When asked "who follows user X?" or "who does user Y follow?", returns correct answer.

Time: Propagation delay. User follows someone new. How long until that relationship is reflected in timeline assembly?

Wrongness tolerance: Tolerates brief inconsistency—follow just happened, not reflected yet. Cannot tolerate persistent wrongness—followed someone a day ago, still not seeing their tweets. Cannot tolerate phantom relationships—unfollowed someone, still seeing their tweets.

Pressure contribution: Relatively stable. Follows/unfollows are less frequent than reads or posts. Pressure comes from queries during timeline assembly at scale.

Tweet Storage (Recipient)

Type: Recipient of new tweets

Expects: Tweets arrive, must be persisted durably.

Time: Write latency. How long from receiving tweet until confirmed durable?

Wrongness tolerance: Cannot tolerate data loss—tweet accepted but not persisted. Tolerates brief unavailability for reads if writes are safe.

Pressure contribution: Write volume during peak events. Read volume when timelines are assembled via fan-out-on-read.

Timeline Assembly (could be a service or a pattern)

This is where the core architectural choice lives. Two fundamental approaches:

Fan-out-on-write: When a tweet is posted, push it to all followers' timelines immediately. Timeline read is simple—just fetch pre-assembled list.

Fan-out-on-read: When a user requests their timeline, fetch tweets from all accounts they follow and assemble in real-time.

These are different systems with different observer relationships. Let me characterize both.

Fan-out-on-write model:

Timeline Cache (Recipient of fan-out)

Expects: Receives tweets to add to specific users' timelines.

Time: Ingestion rate. How fast can it absorb fan-out writes?

Wrongness tolerance: Can tolerate brief delays in receiving tweets. Cannot tolerate lost writes—tweet was supposed to be fanned out but wasn't. Cannot tolerate wrong routing—tweet appears in timeline of non-follower.

Pressure contribution: Celebrity posts create massive write amplification. One tweet to 50 million followers = 50 million writes.

Fan-out-on-read model:

Timeline Assembly Service (Recipient)

Expects: On request, fetches tweets from followed accounts, merges, ranks, returns.

Time: Assembly latency. How long from request until timeline is ready?

Wrongness tolerance: Tolerates incomplete results under pressure—return what you have. Cannot tolerate excessive latency—user waiting for timeline to load. Cannot tolerate gross incoherence—badly ordered results.

Pressure contribution: Every timeline request triggers assembly. Read load directly creates assembly load.

Ranking/Relevance Service (Recipient)

Type: Receives candidate tweets, returns ranked order

Expects: Set of tweets, user context. Must score and order.

Time: Ranking latency. Adds to timeline assembly time.

Wrongness tolerance: Tolerates imperfect ranking—suboptimal order is acceptable. Cannot tolerate extreme latency—ranking that takes seconds defeats the purpose.

Pressure contribution: Called for every timeline request. Compute-intensive. Must degrade gracefully.

Engagement/Analytics (Dependent)

Type: Dependent

Expects: Accurate counts—impressions, likes, retweets, replies. Consistent historical data.

Time: Historical. Computes metrics hours or days after events.

Wrongness tolerance: Tolerates approximate real-time counts—showing "10K likes" when it's actually 10,234 is fine. Cannot tolerate major inaccuracy in settled counts—advertiser paying for impressions needs real numbers. Cannot tolerate retroactive changes without explanation.

Pressure contribution: Doesn't create real-time pressure. Inherits wrongness from real-time systems as data inconsistency.

Ads Service (Dependent)

Type: Dependent on impressions, engagement, user attention

Expects: Accurate targeting. Accurate billing. Tweets appear in timelines as promised.

Time: Near-real-time for delivery, historical for billing.

Wrongness tolerance: Cannot tolerate ads shown but not recorded—lost revenue. Cannot tolerate ads recorded but not shown—billing for phantom impressions. Tolerates targeting imprecision within bounds.

Pressure contribution: Adds tweets to timeline assembly. Competes for attention with organic tweets.

Operations/Monitoring (Supervisor)

Type: Supervisor

Expects: System health metrics. Timeline assembly latency. Fan-out lag. Error rates.

Time: Detection delay.

Wrongness tolerance: Tolerates individual failures. Cannot tolerate invisible degradation—timelines getting stale without alerting.

Pressure contribution: Interventions during incidents.

Where Wrongness Accumulates

In the fan-out path (fan-out-on-write)

Author posts tweet. System must write to every follower's timeline. For a user with 1,000 followers, that's 1,000 writes. For a celebrity with 50 million followers, that's 50 million writes.

Wrongness accumulates in the lag between posting and fan-out completion. The author sees their tweet immediately. Follower A sees it in 100ms. Follower Z sees it in 30 seconds. Follower who follows many celebrities may experience significant lag as their timeline queue backs up.

Under pressure—celebrity posts during major event when many celebrities are posting—fan-out queues grow. The gap between "posted" and "visible to all followers" widens.

The celebrity problem: fan-out-on-write is O(followers) per tweet. When followers number in millions, this doesn't scale. Write amplification overwhelms the system.

In the assembly path (fan-out-on-read)

Reader requests timeline. System must query tweets from all followed accounts, merge, rank, return.

For a user following 100 accounts, that's 100 queries (or some optimized batch). For a user following 5,000 accounts, the query load multiplies.

Wrongness accumulates in assembly latency. User is waiting. Each additional followed account adds latency. Ranking adds latency. Under pressure, these latencies compound.

The heavy-reader problem: fan-out-on-read is O(following) per request. When following count is high and request rate is high, query load explodes.

Between Follow Graph and Timeline

User follows a new account. Follow Graph is updated. But:

In fan-out-on-write: new tweets from that account won't appear until next tweet is posted and fanned out. Historical tweets from the new follow don't backfill automatically.

In fan-out-on-read: the follow change is reflected on next timeline request. But if Follow Graph propagation is slow, the new follow may not be queried.

Unfollow is worse. User unfollows. In fan-out-on-write, tweets already in their timeline remain. In fan-out-on-read, next query excludes that account—but cached results may still show them.

Between Ranking and Reader expectation

Reader expects chronological recency. Or do they? Twitter has oscillated between chronological and algorithmic timelines.

If ranking optimizes for engagement, reader may see older "engaging" tweets above newer less-engaging ones. Reader refreshes but sees same tweets. "Why am I not seeing new tweets?"

Ranking creates expectation mismatch. The system believes it's showing the "best" timeline. The reader believes they're missing things.

Between real-time and Analytics/Ads

Impression recorded when tweet enters timeline. But did user actually see it? Timeline assembled, user scrolls past without looking, closes app. Was that an impression?

Engagement counts are approximate in real-time, precise in batch. The gap creates wrongness for advertisers comparing real-time dashboards to invoices.

Ads delivered but not recorded (system under pressure, logging dropped) creates billing wrongness that surfaces later.

What Happens Under Pressure

Pressure source: Major events. World Cup final. Election night. Breaking news. Celebrity death. Millions of users posting and reading simultaneously.

Fan-out-on-write under pressure:

Celebrity tweets. Fan-out begins. 50 million writes to queue.

Before those complete, another celebrity tweets. Another 50 million writes.

Breaking news means many high-follower accounts tweet in short window. Write queues explode. Fan-out lag grows from seconds to minutes.

Readers see stale timelines. "Why isn't the breaking news showing?" They refresh. Refresh triggers read, but timeline cache hasn't received fan-out yet. Still stale.

Readers don't know the system is behind. They see old tweets and assume the system is broken—or worse, assume nothing is happening.

Meanwhile, normal users with small follower counts have their tweets fanned out quickly. Timeline becomes unbalanced—you see your friend's lunch tweet but not the news from accounts you follow.

Fan-out-on-read under pressure:

Reader requests timeline. Assembly service queries tweet storage for all followed accounts.

Millions of readers make simultaneous requests. Each request triggers hundreds of queries. Query load on tweet storage explodes.

Tweet storage slows. Assembly latency increases. Readers wait. Readers retry. Retries add load. Feedback loop.

Ranking service is called for every assembly. Compute-bound. Under pressure, ranking adds latency. If ranking times out, what do we return? Unranked results? Partial results? Empty timeline?

The hybrid model under pressure:

Most systems use hybrid: fan-out-on-write for normal users, fan-out-on-read for celebrities.

Under pressure, this creates edge cases. Which accounts are "celebrities"? Threshold is arbitrary. Account just under threshold gets full fan-out. Account just over doesn't.

When many mid-tier accounts (1M followers) all tweet during an event, none are celebrities by threshold, all trigger fan-out. Combined, they overwhelm the fan-out system even though individually they're "normal."

Follow Graph under pressure:

User follows breaking news account because everyone is talking about it. Many users follow simultaneously.

Follow Graph receives write spike. Propagation lags. New followers don't see tweets from the account they just followed—because the follow hasn't propagated to timeline assembly.

"I just followed them, why don't I see their tweets?"

Analytics under pressure:

Logging systems receive massive event volume. Write paths back up. Logs are dropped or delayed.

Engagement counts become inaccurate. Real-time dashboards show stale data. Advertisers see impression counts that don't move during peak attention.

When pressure passes, reconciliation runs. Counts adjust. "Why did my tweet's impressions jump by 100K an hour after the event?"

Constraints and Questions

Core architectural question:

Fan-out-on-write vs fan-out-on-read is fundamentally a question of where to put the work.

Fan-out-on-write: work happens at write time. Scales writes with follower count. Reads are cheap.

Fan-out-on-read: work happens at read time. Writes are cheap. Scales reads with following count.

Neither eliminates work. Both route pressure differently.

From observer analysis:

Tweet Author:

How quickly must we acknowledge the tweet?
How quickly must it be visible to at least some followers?
What do we show the author about distribution status?

Timeline Reader:

How fresh must the timeline be?
What's acceptable latency for timeline load?
When we can't assemble a complete timeline, what do we show?
Do we show chronological or ranked order? Who decides?

Follow Graph:

How quickly must follow changes be reflected?
How do we handle rapid follow/unfollow (spam, gaming)?
What consistency guarantee do we provide?

Timeline Assembly:

How do we handle the celebrity problem?
How do we bound assembly latency?
What's the degradation strategy when systems are slow?

Ranking:

Is ranking required or optional?
Can we fall back to chronological when ranking is overloaded?
How do we balance relevance vs recency?

Analytics/Ads:

What accuracy do we guarantee for real-time counts?
How do we handle logged events under pressure?
What's the reconciliation window?

Operations:

How do we detect timeline staleness?
How do we detect fan-out lag?
What interventions are available during overload?

Design Candidates

Tweet Storage

The problem: Persist tweets durably. Serve reads for timeline assembly.

Simplest design: Write tweet to database. Index by author and timestamp. Query by author ID to retrieve recent tweets.

Where wrongness accumulates:

Between write acknowledgment and durability. Author sees "tweet posted." Is it actually persisted? If acknowledgment precedes durable write, a crash loses the tweet.

Between storage and availability for reads. Tweet is persisted. Replica hasn't received it yet. Fan-out-on-read queries replica. Tweet is missing from timeline.

Design choice 1: When do we acknowledge to author?

Option A: Acknowledge after durable write.

Author knows tweet is safe
Latency includes disk/replication time
Under pressure, acknowledgment slows

Option B: Acknowledge after memory write, persist asynchronously.

Fast acknowledgment
Crash window where tweet can be lost
Author believes tweet exists when it might not

Tradeoff: Option A privileges author certainty, taxes latency. Option B privileges speed, taxes durability. Lost tweets are catastrophic for author trust.

Design choice 2: How do we partition tweets?

Option A: By author ID.

All tweets from one author on same partition
Author's timeline queries hit one partition
Hot authors (celebrities) create hot partitions

Option B: By tweet ID (time-based).

Writes distribute evenly
Reading one author's tweets requires scatter-gather across partitions
No hot partition problem

Option C: By author, with hot author sharding.

Normal authors: single partition
Celebrities: tweets spread across multiple partitions
Query complexity increases for celebrities

Tradeoff: Author-partitioned is simple but creates hotspots. Tweet-partitioned distributes load but complicates author queries. Hybrid adds complexity.

Design choice 3: How many replicas, and what consistency?

Option A: Strong consistency. Write waits for all replicas.

No stale reads
Write latency increases
Availability decreases (any replica down blocks writes)

Option B: Eventual consistency. Write to primary, replicate asynchronously.

Fast writes
Stale reads possible
Under pressure, replication lag grows

Tradeoff: Strong consistency privileges readers (no stale data), taxes write latency and availability. Eventual consistency privileges write speed, taxes read freshness.

Design synthesis for Tweet Storage:

Acknowledge after write to primary plus at least one replica. Balance between durability and speed.
Partition by author for normal users. Shard hot authors across multiple partitions with consistent routing.
Eventual consistency for timeline reads. Accept brief staleness. Timeline is already eventually consistent by nature—fan-out takes time regardless.
Separate write path (must be durable) from read path (can serve from replicas). Optimize each independently.

Follow Graph Service

The problem: Know who follows whom. Answer two queries: "who follows X?" (for fan-out-on-write) and "who does Y follow?" (for fan-out-on-read).

Simplest design: Store edges in database. User A follows User B = row (A, B). Query by follower or followee.

Where wrongness accumulates:

Between follow action and graph update. User clicks follow. How long until the graph reflects this?

Between graph update and timeline impact. Graph is updated. Fan-out or assembly hasn't seen the change yet.

In asymmetric query costs. "Who does A follow?" might return 500 results. "Who follows celebrity X?" might return 50 million results.

Design choice 1: How do we store the graph?

Option A: Single edge table. Query by either column.

Simple
Either "followers of X" or "following of X" is indexed; other requires scan
Both queries cannot be optimal

Option B: Dual storage. Maintain both (follower → followee) and (followee → follower) indexes.

Both queries are efficient
Write amplification: every follow is two writes
Consistency risk between the two stores

Tradeoff: Single store is simple but one query direction is slow. Dual store is fast for reads but doubles writes and risks inconsistency.

Design choice 2: How do we handle celebrity follower lists?

"Who follows @taylorswift?" returns 90 million edges. This cannot be returned in a single query.

Option A: Paginate. Return followers in chunks.

Works for any size
Fan-out-on-write must iterate through all pages
Slow for massive lists

Option B: Pre-segment. Store celebrity followers in sharded segments. Fan-out can parallelize across segments.

Faster fan-out for celebrities
More complex storage model
Must decide who is a "celebrity"

Option C: Don't answer this query for celebrities. Celebrities use fan-out-on-read; follower list is never enumerated.

Sidesteps the problem
Requires hybrid fan-out architecture
Must still answer "who does Y follow?" for celebrity followers

Tradeoff: Pagination is simple but slow for huge lists. Pre-segmentation is faster but complex. Avoiding the query requires architectural changes elsewhere.

Design choice 3: How quickly must follow changes propagate?

Option A: Synchronous. Follow is not acknowledged until graph is updated and downstream systems notified.

User sees immediate effect
Slow acknowledgment
Coupling to downstream systems

Option B: Asynchronous. Acknowledge follow immediately, propagate in background.

Fast acknowledgment
User may not see immediate effect
Propagation lag is invisible to user

Tradeoff: Synchronous privileges user expectation (I followed, I should see their tweets), taxes latency. Asynchronous privileges speed, taxes consistency of experience.

Design synthesis for Follow Graph Service:

Dual storage for both query directions. Accept write amplification for read efficiency.
Asynchronous propagation. Acknowledge follow immediately, update graph asynchronously.
Pre-segment followers for accounts above threshold (say, 100K followers). Enable parallel fan-out.
Cache hot graphs. Most timeline requests come from active users following similar popular accounts. Cache "who does X follow?" for active users.
Publish follow/unfollow events. Let downstream systems (fan-out, timeline cache) subscribe and update their state.

Fan-out System

The problem: When a tweet is posted, get it into followers' timelines. This is where the core architectural decision lives.

Option 1: Fan-out-on-write (push model)

When tweet is posted:

Query "who follows this author?"
For each follower, write tweet reference to their timeline cache

Characteristics:

Work is proportional to follower count
Write amplification: 1 tweet → N writes (N = follower count)
Read is simple: fetch pre-assembled timeline
Celebrity problem: 50M followers = 50M writes per tweet

Option 2: Fan-out-on-read (pull model)

When timeline is requested:

Query "who does this reader follow?"
For each followed account, fetch recent tweets
Merge and rank

Characteristics:

Work is proportional to following count
Read amplification: 1 timeline request → N queries (N = following count)
Write is simple: just store the tweet
Heavy reader problem: user following 5,000 accounts = expensive assembly

Option 3: Hybrid

Fan-out-on-write for normal users. Fan-out-on-read for celebrities.

Characteristics:

Avoids worst cases of both
Complexity: must classify users, handle edge cases
Timeline assembly must merge cached (fanned-out) and fetched (on-demand) tweets

Where wrongness accumulates in each model:

Fan-out-on-write:

Fan-out lag. Tweet posted at T0. Fan-out completes at T0 + 30 seconds. Followers requesting timeline between T0 and T0+30 don't see the tweet.

Wrongness is invisible to readers. They don't know a tweet was posted that they haven't received. They see a timeline that looks complete but isn't.

Under pressure, fan-out lag grows. Readers see increasingly stale timelines without knowing it.

Fan-out-on-read:

Assembly latency. Reader requests timeline. Assembly takes 2 seconds under pressure. Reader waits.

Wrongness is visible to readers. They see the spinner. They know something is slow.

Under pressure, readers may give up. But they know the system is struggling.

Hybrid:

Merge complexity. Some tweets came from fan-out cache. Some from on-demand fetch. Merging must produce coherent timeline.

Edge cases. User follows 500 normal accounts (fanned out) and 10 celebrities (fetched on demand). Timeline must merge both sources correctly.

Threshold artifacts. Account at 99K followers is fanned out. Account at 101K is not. Similar accounts, different behavior. If threshold changes, accounts shift categories, timeline behavior changes.

Design choice 1: Where is the threshold for hybrid?

Option A: Fixed follower count (e.g., 100K followers).

Simple rule
Arbitrary cutoff
Accounts near threshold may oscillate as followers fluctuate

Option B: Dynamic based on system load.

Under pressure, lower the threshold—more accounts become "celebrities," reducing fan-out load
Complex to implement
Behavior is unpredictable to users

Option C: Based on posting frequency × follower count.

High-frequency celebrities (news accounts) pulled
Low-frequency celebrities (actors who tweet rarely) can be fanned out
More nuanced but more complex

Tradeoff: Fixed is simple but arbitrary. Dynamic adapts but is unpredictable. Frequency-adjusted is smarter but harder to implement and explain.

Design choice 2: How do we handle fan-out lag?

When fan-out is behind, readers see stale timelines. What do we do?

Option A: Nothing. Accept staleness.

Simple
Readers don't know they're missing tweets
Under pressure, experience degrades silently

Option B: Show "new tweets available" indicator when fan-out catches up.

Reader knows to refresh
Requires tracking what reader has seen vs what's available
Additional complexity

Option C: Supplement cached timeline with on-demand fetch for very recent tweets.

Always check author's last few tweets, even for fanned-out accounts
Reduces staleness at cost of additional reads
Hybrid becomes more complex

Tradeoff: Silent staleness is simple but degrades trust. Indicators are honest but add complexity. Supplemental fetch reduces staleness but increases read load.

Design choice 3: Fan-out ordering and priorities

Celebrity tweets should reach engaged followers quickly. But 50M followers can't all be first.

Option A: Random order. All followers equally likely to receive tweet early or late.

Fair
Engaged users (who check frequently) may receive tweet after they've already checked

Option B: Prioritize active users. Fans who check frequently get fan-out first.

Active users see fresh content
Inactive users receive tweet eventually
Requires tracking activity levels

Option C: Geographic/time-based. Fan out to users in waking hours first.

Tweets arrive when people are likely to see them
Complex to implement
May feel unfair to night owls

Tradeoff: Random is fair but not optimized. Activity-based is smart but requires tracking and may feel unfair. Geographic is contextual but complex.

Design synthesis for Fan-out System:

Hybrid model. Fan-out-on-write below threshold (100K followers). Fan-out-on-read above.
Fixed threshold for simplicity. Accept the edge case artifacts.
Prioritize fan-out to active users. Use recent login time as proxy for activity.
"New tweets available" indicator when fan-out catches up to reader's position. Be honest about staleness.
Under extreme pressure, raise threshold temporarily. More accounts become pull-based, reducing fan-out load. Revert when pressure subsides.

Timeline Cache

The problem: Store pre-assembled timelines for fast reads (in fan-out-on-write model). Or cache assembled results (in fan-out-on-read model).

Simplest design: Per-user list of tweet IDs, ordered by time. Bounded size (e.g., last 800 tweets). Fetch and return on request.

Where wrongness accumulates:

Cache staleness. Fan-out hasn't reached this user yet. Cache returns stale timeline.

Cache size limits. User follows many active accounts. Cache can only hold 800 tweets. Older tweets are evicted. User scrolls back, hits the boundary, experiences discontinuity.

Cache cold start. New user, or user returning after long absence. No cached timeline. Must assemble from scratch.

Design choice 1: What do we cache?

Option A: Full tweet content.

Fast reads—no additional fetch needed
Cache size explodes (tweets with media, etc.)
Tweet edits/deletes require cache invalidation

Option B: Tweet IDs only.

Small cache footprint
Read requires hydration—fetch full tweet content by IDs
Tweet edits/deletes don't require cache invalidation

Tradeoff: Full content is fast but large and stale-prone. IDs only are compact but require hydration step.

Design choice 2: How do we handle cache misses?

Option A: Assemble on demand. Cache miss triggers full timeline assembly.

Eventually consistent
Cold start is slow
Under pressure, cache misses amplify load

Option B: Serve partial results. Return what's cached, assemble rest asynchronously.

Fast response
User sees incomplete timeline initially
Must handle "more tweets loading" UX

Option C: Pre-warm caches. Predict who will request timeline, assemble before they ask.

Fast reads for predicted users
Wasted work for unpredicted users
Must predict accurately

Tradeoff: On-demand assembly is correct but slow on miss. Partial results are fast but incomplete. Pre-warming is fast but speculative.

Design choice 3: How long do cached timelines live?

Option A: Indefinitely, updated by fan-out.

Always current (as current as fan-out)
Memory pressure from inactive users
Stale data for users who stopped receiving fan-out

Option B: TTL-based expiration.

Bounded memory
Cache miss after TTL
Active users may experience unnecessary misses

Option C: LRU eviction.

Memory bounded
Active users stay cached
Inactive users evicted, cold start on return

Tradeoff: Indefinite is current but unbounded. TTL is bounded but arbitrary. LRU is adaptive but inactive users pay cold start cost.

Design synthesis for Timeline Cache:

Cache tweet IDs, not full content. Hydrate on read. Accept hydration latency for compactness and avoiding invalidation complexity.
LRU eviction. Active users stay warm. Inactive users evicted, accept cold start cost on return.
Partial results on cold start. Return top tweets quickly, backfill asynchronously. "Loading more tweets" is better than spinner.
Bound cache size per user (800-1000 tweets). Scrolling beyond cache hits persistent storage. Different latency expectation is acceptable for deep scrolling.

Timeline Assembly Service

The problem: For fan-out-on-read accounts (celebrities) and cold starts, assemble timeline from source data.

Simplest design: Query followed accounts, fetch recent tweets from each, merge by timestamp, return.

Where wrongness accumulates:

Query fan-out. User follows 500 accounts. Even with batching, this is significant query load.

Merge complexity. 500 accounts × 10 recent tweets each = 5,000 tweets to merge. Must sort, dedupe (retweets), apply ranking.

Latency accumulation. Each step adds latency. Under pressure, latency compounds beyond acceptable bounds.

Design choice 1: How do we query followed accounts' tweets?

Option A: Sequential queries.

Simple
Latency is sum of all queries
Unacceptable for large following counts

Option B: Parallel queries with scatter-gather.

Latency is max of queries, not sum
Requires managing many concurrent connections
Partial failures must be handled

Option C: Pre-aggregated feeds for popular accounts.

Celebrity accounts have pre-built recent tweet lists
Reduces scatter-gather for most common follows
Additional storage and maintenance

Tradeoff: Sequential is simple but slow. Parallel is fast but complex. Pre-aggregation optimizes common case but adds infrastructure.

Design choice 2: What if some queries fail or timeout?

Option A: Fail entire assembly.

Consistent—either complete timeline or nothing
One slow account blocks everything
Poor experience under partial failure

Option B: Return partial results, exclude failed sources.

User sees something quickly
Some followed accounts' tweets missing
Must communicate incompleteness

Option C: Return cached stale results for failed sources.

User sees complete-looking timeline
Some content may be stale
Staleness is invisible

Tradeoff: Failing entirely is consistent but fragile. Partial results are honest but incomplete. Stale fallback is complete but potentially misleading.

Design choice 3: How do we bound assembly latency?

Option A: Hard timeout. Return whatever is assembled when timeout hits.

Bounded latency
Potentially incomplete results
Predictable user experience

Option B: Adaptive. Adjust parallelism and depth based on system load.

Optimizes for conditions
Unpredictable latency
Complex tuning

Tradeoff: Hard timeout is predictable but may be incomplete. Adaptive optimizes but is unpredictable.

Design synthesis for Timeline Assembly:

Parallel scatter-gather. Query all followed accounts concurrently.
Pre-aggregate recent tweets for high-follow accounts. Most users follow some popular accounts; optimize for this.
Hard timeout (e.g., 500ms). Return what's assembled. Incomplete is acceptable; slow is not.
Stale fallback for failed queries. If a source fails, use last known tweets from that source. Prefer complete-looking stale over incomplete fresh.
Progressive loading. Return top results fast, continue assembling for scroll depth.

Ranking Service

The problem: Given candidate tweets, order them by relevance to the reader. Chronological is simple but not engaging. Algorithmic increases engagement but creates expectations mismatch.

Simplest design: Sort by timestamp, newest first.

Where wrongness accumulates:

Chronological wrongness: user misses important tweets because they were buried by volume. User follows 1,000 accounts, checks once a day. They see only what was posted in the last hour.

Algorithmic wrongness: user sees tweets out of order, feels manipulated. "Why am I seeing a tweet from yesterday above one from an hour ago?"

Expectation mismatch: user doesn't know which mode they're in. They assume chronological, see algorithmic, feel confused.

Design choice 1: Chronological vs algorithmic?

Option A: Pure chronological.

Predictable
Users understand what they're seeing
High-volume follows bury low-volume quality
Miss important tweets from infrequent posters

Option B: Pure algorithmic.

Optimizes for engagement
Users don't understand ordering
Filter bubble concerns
"Why am I seeing this?" frustration

Option C: User choice.

Users can switch modes
Must maintain both pipelines
Default matters—most won't change it

Tradeoff: Chronological is honest but suboptimal for engagement. Algorithmic is engaging but opaque. Choice is flexible but complex.

Design choice 2: If algorithmic, what signals?

Recency (when was it posted?)
Author relationship (how often do you interact with this author?)
Engagement (how many likes/retweets?)
Content type (photo, video, text?)
Topic relevance (does it match your interests?)

Each signal adds compute. Each signal can be gamed. Each signal has failure modes.

Tradeoff: More signals = better ranking but more compute and more gaming surface. Fewer signals = simpler but less personalized.

Design choice 3: How do we handle ranking latency under pressure?

Option A: Skip ranking, return chronological.

Fast fallback
Experience changes under load
Users notice inconsistency

Option B: Use simplified ranking model.

Faster than full model
Less optimal ranking
Consistent experience type

Option C: Cache ranked results.

Fast reads for repeat requests
Stale ranking
Doesn't adapt to new tweets between requests

Tradeoff: Falling back to chronological is honest but inconsistent. Simplified ranking is consistent but lower quality. Caching is fast but stale.

Design synthesis for Ranking Service:

Default to algorithmic with chronological option. Let users choose, default to engagement-optimized.
Limit signals to computable set. Recency, author interaction history, engagement. Avoid expensive content analysis in hot path.
Simplified model as fallback. Under pressure, use faster model with fewer signals rather than falling back to chronological.
Cache ranking inputs, not outputs. Cache author interaction scores, not ranked timelines. Ranking can still adapt to new tweets.
Be transparent about what "For You" vs "Following" means. Different tabs, clear labels.

Engagement/Analytics

The problem: Count impressions, likes, retweets, replies. Provide accurate metrics to authors and advertisers.

Type: Dependent. Evaluates after the fact.

Simplest design: Log every impression. Aggregate in batch. Serve counts from aggregated data.

Where wrongness accumulates:

Real-time vs accurate. Counting 1M impressions per second with perfect accuracy is hard. Systems approximate in real-time and reconcile in batch.

Definition ambiguity. What's an impression? Tweet rendered on screen? Tweet scrolled past? Tweet visible for 1 second? Different definitions, different counts.

Under pressure, logging degrades. Events dropped. Counts are undercounted. Authors see lower engagement than actual.

Design choice 1: Real-time counts vs batch-only?

Option A: Real-time counters, increment on every event.

Authors see immediate feedback
Distributed counting is hard
Under pressure, counters may lose events

Option B: Batch only. Counts update hourly/daily.

Accurate
Slow feedback loop
Authors frustrated by stale counts

Option C: Approximate real-time, accurate batch.

Best of both
Two systems, must reconcile
Visible jumps when batch corrects real-time

Tradeoff: Real-time is fast but potentially inaccurate. Batch is accurate but slow. Both is comprehensive but creates visible reconciliation artifacts.

Design choice 2: How do we handle undercounting during pressure?

Option A: Accept it. Counts are lower bounds.

Simple
Authors see suppressed engagement
May lose ad revenue if impressions undercounted

Option B: Estimate and adjust. Use sampling to extrapolate.

Better accuracy under pressure
Statistical complexity
Estimates may be wrong

Option C: Buffer and replay. Queue events during pressure, process when recovered.

Eventually accurate
Requires durable queue
Batch reconciliation is delayed

Tradeoff: Accepting undercounts is simple but loses accuracy. Estimation is smart but adds uncertainty. Buffering is accurate but delays reconciliation.

Design choice 3: What's an impression?

Option A: Tweet fetched for timeline.

Easy to measure (server-side)
Overcounts—tweet may not render, user may not scroll to it

Option B: Tweet rendered on client.

More accurate
Requires client reporting
Client may not report (offline, closed app)

Option C: Tweet visible for N seconds.

Most meaningful for attention
Complex to measure
High reporting load

Tradeoff: Server-side overcounts but is reliable. Client-rendered is more accurate but lossy. Visibility-based is meaningful but expensive.

Design synthesis for Engagement/Analytics:

Approximate real-time with batch reconciliation. Show fast estimates, correct with accurate batch counts.
Buffer events during pressure. Accept delayed reconciliation over permanent undercount.
Server-side for impressions, client-side for engagement (likes, retweets). Impressions can overcount; engagement must be accurate.
Clear definitions for advertisers. Document what "impression" means. Don't change definitions without notice.

Ads Service

The problem: Insert promoted tweets into timelines. Target accurately. Bill correctly.

Type: Dependent on impression/engagement data. Also injects into timeline assembly.

Where wrongness accumulates:

Between targeting and delivery. Ad targeted to user. User's interests changed. Ad is now irrelevant.

Between delivery and billing. Ad shown but impression not logged. Or impression logged but ad not actually shown (cache, bug, etc.).

Between advertiser expectation and reality. Advertiser expected 1M impressions. Got 800K. Why? Targeting too narrow? System under pressure? Competition for ad slots?

Design choice 1: When do we select ads?

Option A: At timeline assembly time.

Fresh targeting
Adds latency to timeline
Under pressure, ad selection may be skipped

Option B: Pre-selected ad pool, insert at assembly.

Fast insertion
Targeting may be stale
Pool must be refreshed periodically

Tradeoff: Real-time selection is accurate but slow. Pre-selection is fast but potentially stale.

Design choice 2: What if ad selection fails/times out?

Option A: Show no ad.

Timeline loads quickly
Lost revenue
Inconsistent ad load

Option B: Show fallback ad (less targeted).

Revenue preserved
Lower relevance
User experience may suffer

Option C: Show house ad (promote Twitter features).

Slot filled, no lost external revenue
Not monetized
Acceptable degradation

Tradeoff: No ad prioritizes user experience but loses revenue. Fallback preserves revenue but may be irrelevant. House ad is compromise.

Design choice 3: How do we ensure billing accuracy?

Impression logged at timeline assembly. But did user see it?

Option A: Bill on server impression.

Simple
May overbill
Advertiser distrust

Option B: Bill on client-confirmed visibility.

More accurate
Client may not report
Underbilling

Option C: Bill on server, refund on discrepancy.

Conservative for advertiser
Complex reconciliation
Good faith relationship

Tradeoff: Server billing is simple but overcounts. Client billing is accurate but undercounts. Refund model is fair but complex.

Design synthesis for Ads Service:

Pre-selected ad pool per user segment, refreshed frequently. Balance freshness and latency.
Fallback to less-targeted ads under pressure. Preserve revenue over perfect targeting.
Bill on server impression, audit with client visibility. Provide transparency to advertisers.
Separate ad delivery metrics from organic metrics. Different SLAs, different pipelines.

Operations/Monitoring

The problem: See system health. Detect degradation. Intervene effectively.

Type: Supervisor

Simplest design: Aggregate metrics (timeline latency, fan-out lag, cache hit rates). Alert on thresholds. Dashboard for humans.

Where wrongness hides:

Aggregate masks local. Global timeline latency is 200ms. Timeline latency for users following many celebrities is 2s.

Fan-out lag is invisible to readers. Readers don't know they're missing tweets. Metrics show fan-out queue depth, not reader experience of staleness.

Engagement drops may be system issue or content issue. Hard to distinguish "system dropping events" from "users aren't engaging with this content."

Design choice 1: What metrics matter most?

Option A: System metrics. Latency, error rate, queue depth.

Easy to measure
Doesn't capture user experience
System can look healthy while users suffer

Option B: Experience metrics. Time-to-first-tweet, staleness, completeness.

Captures what users feel
Harder to measure
Requires end-to-end instrumentation

Tradeoff: System metrics are easy but indirect. Experience metrics are meaningful but harder to instrument.

Design choice 2: How do we detect fan-out staleness?

Option A: Monitor queue depth and lag.

Measures system state
Doesn't show which users are affected

Option B: Sample reader timelines, compare to expected.

Measures actual staleness
Expensive (requires timeline assembly for comparison)
Sampling may miss localized problems

Tradeoff: Queue metrics are cheap but indirect. Sampling is accurate but expensive.

Design choice 3: What interventions are available?

When timeline latency spikes, options include:

Raise celebrity threshold (more accounts use fan-out-on-read → less fan-out load)
Skip ranking (faster assembly)
Serve stale cache more aggressively
Reject some percentage of requests (backpressure)

Each intervention routes wrongness somewhere. Must understand where.

Design synthesis for Operations/Monitoring:

Both system and experience metrics. Queue depth AND time-to-first-tweet. Error rate AND completeness.
Sample timeline freshness. Periodically check "when was newest tweet posted" vs "when was it visible to follower."
Tiered interventions with documented wrongness tradeoffs. Write runbook: "If fan-out lag > 60s, raise celebrity threshold. Effect: more users experience assembly latency instead of staleness."
Expose "system is degraded" to users when appropriate. Honest degradation beats silent wrongness.

Twitter Timeline — Summary of Design Tradeoffs

System	Primary Tradeoff	Who Gains	Who Pays
Tweet Storage	Acknowledge after durable write vs async	Author certainty vs speed	Authors pay with latency or durability
Follow Graph	Dual storage vs single	Read speed vs write simplicity	Writes pay for read efficiency
Fan-out	Write vs read	Read latency vs write amplification	Celebrities break fan-out-on-write; heavy readers break fan-out-on-read
Fan-out	Fixed vs dynamic threshold	Simplicity vs adaptability	Edge cases pay for simplicity
Timeline Cache	IDs vs full content	Size/freshness vs read latency	Hydration latency pays for compactness
Assembly	Complete vs partial results	Consistency vs latency	Completeness pays for speed
Ranking	Chronological vs algorithmic	Predictability vs engagement	User understanding pays for engagement
Analytics	Real-time vs batch	Feedback speed vs accuracy	Accuracy pays for immediacy
Ads	Real-time vs pre-selected	Targeting accuracy vs latency	Targeting pays for speed
Operations	System vs experience metrics	Ease vs accuracy	Understanding pays for measurement simplicity

The Twitter timeline problem is fundamentally about where to put work: write time (fan-out-on-write) or read time (fan-out-on-read). Neither eliminates work. Both route pressure differently.

The celebrity problem forces hybrid. The hybrid forces complexity at the merge point. Under pressure, any choice degrades—the question is which degradation readers can tolerate.

Software Design 101: Twitter Timeline

Stackshala

System Overview

Observer Analysis

Where Wrongness Accumulates

What Happens Under Pressure

Constraints and Questions

Design Candidates

Tweet Storage

Follow Graph Service

Fan-out System

Timeline Cache

Timeline Assembly Service

Ranking Service

Engagement/Analytics

Ads Service

Operations/Monitoring

Twitter Timeline — Summary of Design Tradeoffs

Read more

System Design 101: Designing Uber

System Design 101 : Pressure

System Design 101: Wrongness

System Design 101: Time