Skip to main content
GuideSystem designMobile system designInstagram stories

Instagram Stories lives and dies on preload timing

A practical Instagram Stories system design guide for software engineers: how to scope the viewer, model playback and seen receipts, and talk through mobile-specific trade-offs without overdesigning the whole app.

Fin·Apr 9, 2026·7 min read
StrongYes tip

Instagram Stories is a strong system design prompt because it exposes whether you can keep one product loop coherent. If you skip straight to CDN talk, you miss the real contract: playback has to feel instant, seen state has to stay trustworthy, and interruptions cannot leave the user lost.

Instagram Stories is a useful system design prompt because it forces you to keep mobile UX, session state, and backend trade-offs in one coherent loop.

Most candidates go wrong in one of two directions:

  • they answer it like a generic social-feed backend
  • they answer it like a pure client-animation problem

The useful middle is the viewer itself: ordered stories, segment playback, gesture navigation, preloading, background and foreground recovery, and seen-state sync that survives retries.

Keep the viewing loop clear and the prompt gets much easier.

Start with the narrow product contract

Do not design all of Instagram. Design the viewing experience.

A clean first-pass scope sounds like this:

  1. Show a tray of followed users with active stories.
  2. Open one user's story and play segments in order.
  3. Support tap forward, tap back, hold to pause, and swipe to the next user.
  4. Mark segments as seen and preserve progress across reconnects.
  5. Preload nearby media so the next segment feels immediate.

Then name what you are explicitly not doing in version one:

  • story creation and editing
  • ranking the home feed
  • ad insertion
  • complex media-transcoding internals
  • global recommendation logic

That framing matters because it tells the interviewer where the real state machine lives. The hard part is not "how do I store videos." It is "how do I keep this viewing session correct when the user taps fast, goes offline, backgrounds the app, and returns to the same sequence?"

Treat the player like a state machine, not a timer

The most important design choice in this prompt is recognizing that playback is not one boolean called isPlaying.

You need explicit state for:

  • which user is active
  • which segment inside that user's story is active
  • whether media is loading, playing, paused, buffering, or blocked by app lifecycle
  • which seen receipts are only local versus durably synced

A compact model could look like this:

TS
type PlaybackCursor = { userId: string; storyId: string; segmentId: string; segmentIndex: number; }; type PlayerState = | { kind: 'loading'; cursor: PlaybackCursor } | { kind: 'playing'; cursor: PlaybackCursor; startedAtMs: number } | { kind: 'paused'; cursor: PlaybackCursor; elapsedMs: number } | { kind: 'buffering'; cursor: PlaybackCursor; elapsedMs: number } | { kind: 'backgrounded'; cursor: PlaybackCursor; elapsedMs: number }; type SeenReceipt = { segmentId: string; viewedAt: string; completionPct: number; idempotencyKey: string; };

The code is not the point. The point is that every follow-up question becomes easier once the interviewer believes you have a real session model. If the app backgrounds, you move from playing to backgrounded. If the user holds the screen, you move to paused. If the next segment is not ready, you move to buffering instead of pretending time kept moving normally.

Separate metadata, media delivery, and seen receipts

Another common mistake is collapsing everything into one Story table and one GET /feed endpoint. Keep the system in three parts:

  1. Story metadata Who owns the segment, when it expires, ordering, duration, and media URL or media key.

  2. Media delivery Image and video bytes served from object storage plus CDN, not from the application server.

  3. Seen state A receipt stream or append-only event path that records what the viewer has finished, then projects unread state back into the tray.

That split keeps the design honest: metadata changes more often than media files, media delivery needs low latency and , and seen updates need idempotency and replay safety. If you store "seen" as one mutable boolean directly on the story row, the answer starts to wobble.

Narrate one clean happy path

Interviewers trust you more once you can walk one request flow from open to completion. For Instagram Stories, the first happy path can be:

  1. Client requests the story tray and the first user's segment metadata.
  2. API returns ordered users, segment metadata, unread hints, and media keys or signed URLs for the immediate window.
  3. Client loads the current segment and prefetches the next one or two segments.
  4. When a segment completes, the client updates local progress immediately so the UI stays smooth.
  5. The client batches a seen receipt to the backend.
  6. Backend stores the receipt idempotently, then updates unread projections asynchronously.

The UI stays responsive because local progress does not wait for the network. The backend stays trustworthy because durable seen state still exists outside the phone.

The APIs worth saying out loud

Name only the endpoints that reflect the product contract.

HTTP
GET /v1/stories/tray?viewerId=u_123&limit=20
JSON
{ "users": [ { "userId": "u_456", "hasUnseen": true, "segments": [ { "segmentId": "seg_1", "storyId": "story_9", "mediaType": "video", "durationMs": 5000, "mediaUrl": "https://cdn.example.com/seg_1.mp4", "expiresAt": "2026-04-10T01:00:00Z" } ] } ], "nextCursor": "tray_20" }
HTTP
POST /v1/stories/seen-receipts Content-Type: application/json
JSON
{ "viewerId": "u_123", "receipts": [ { "segmentId": "seg_1", "viewedAt": "2026-04-09T21:18:00Z", "completionPct": 1, "idempotencyKey": "u_123:seg_1:1" } ] }

The key phrase here is idempotent receipt ingestion. Mobile clients retry. Background flushes can race. A duplicate "seen" write should not corrupt the viewer state or inflate analytics.

This is where the prompt naturally overlaps with API Design Interview Questions: the contract matters at least as much as storage.

The first architecture pass should stay boring

Once the product loop is clear, the first architecture can stay simple:

  • mobile client owns playback state, gesture handling, and a short local cache
  • story service returns ordered metadata for the tray and current story window
  • object storage plus CDN serves image and video media
  • seen-receipt service writes append-only events
  • background workers project unread markers and clean up expired stories

That is enough for a strong first pass. Skip:

  • multi-region active-active writes
  • complex recommendation ranking
  • elaborate microservice fan-out
  • custom video infrastructure

Those may matter later, but the first version should prove that the viewing loop is correct.

Where to go deeper if the interviewer pushes

Once the first pass is stable, there are three natural depth lanes:

1. Preloading without wasting bandwidth

Preload the current user's next segment and the first segment of the next user, not the entire tray.

2. Seen-state correctness under retries and offline use

Use optimistic local state plus durable server receipts: mark segments seen locally when the threshold is met, queue receipts offline, send idempotent batches on reconnect, and rebuild unread state from receipts rather than client memory alone.

3. Expiration, ordering, and tray freshness

Stories expire after 24 hours, so mention expiration filtering, background cleanup, ordering users by freshest active story, and tray refresh after new uploads or newly synced receipts.

If the interviewer pushes on scale, talk about fan-out strategy, caching hot trays, or async projection updates only after the viewing contract is stable.

Mistakes that make this answer sound fake

  • Solving upload and editing instead of viewing. Camera pipelines and filters are usually scope drift here.
  • Treating playback as a timer with no lifecycle. Mobile apps background. Networks stall. Users hold and scrub.
  • Mixing media delivery and metadata reads. The app server should not be the thing streaming every video byte.
  • Ignoring idempotency on seen updates. Retries and duplicate receipts are normal.
  • Optimizing celebrity-scale fan-out before the core loop works.

A short prep loop before the interview

If you want this prompt to feel calm in a live round, practice it in this order:

  1. Give a two-minute opening that scopes the problem to viewing only.
  2. Draw the playback state machine from memory.
  3. Explain the seen receipt path and why it must be idempotent.
  4. Practice one scale follow-up: preloading, tray ordering, or offline sync.
  5. End with one replay using Mobile System Design Interview so you can hear whether your explanation stays ordered.

If your broader system-design reps still feel loose, pair this with Mobile System Design Interview and Distributed Systems Interview Questions.

Practice out loud.

Explain your thinking like you're in the interview.

Practice with Fin or Coco
Source note

Fin and Coco are StrongYes editorial personas from the Council of Ternary Vertices — a trinary-star animal civilization that studies Earth's coding-interview process. Anecdotes map animal-universe experience to human interview mechanics; they are NEVER human-career claims. External citations link to public primary sources.

StrongYes editorial guide grounded in the April 9, 2026 /learn voice-drift rewrite queue, the existing StrongYes system-design and API-design guides, and the pre-existing Instagram Stories viewing requirements already captured in repo truth.

Last verified Apr 9, 2026.

Practice System design.

Reading builds recognition. Explaining builds recall. Run these problems with Fin or Coco.