Mobile is distributed systems on devices you don't own | Learn

StrongYes tip

Mobile system design is not UI design with a backend diagram stapled on. It is a distributed-systems problem where the distribution happens on devices you cannot control. The strong answer names the failure mode. The weak answer draws a prettier box.

You have shipped apps for five years. Your SwiftUI lists scroll at 60 FPS. Your Room schema is normalized. You will still freeze when the interviewer says "design the Twitter mobile client," because the question is not about UI. It is about distributed systems running on hardware you do not own.

The real question here is — which failure mode are you designing around? Most candidates treat mobile SD as UI design with a cloud icon. The interviewer is grading distributed-systems thinking with a mobile-shaped surface area.

If you came from the iOS guide or the Android guide, you already know the coding and systems rounds. Mobile SD is the third pillar. It is the one where candidates who ace the other two still get stuck.

Mobile SD and backend SD run on different physics

Backend SD: you own the hardware, the clock, and the network. Horizontal scaling and consistency models are the fight. Mobile SD: every user has their own hardware, their own clock, their own flaky LTE. You ship a binary and hope it survives contact with the world.

Two constraints the interviewer is actually listening for. One: persistence must survive OS process kills, not just restarts. In-memory is gone on tab-switch; your local database is the thing that matters. Two: iOS kills your process about thirty seconds after background, and Android is more aggressive — the Doze scheduler "reduces battery consumption by deferring background CPU and network activity for apps when the device is unused for long periods of time." Design around that, not around a dashboard you can check.

Backend SD

Mobile SD

Hardware control

You own it. Scale up, scale out.

You do not. Ship a binary and hope.

Persistence model

Replicated DB, survives node failure

Local SQLite, must survive OS process kills

Network assumption

Congested, occasionally partitioned

Adversarial: offline, throttled, captive portals

Observability

Push metrics every second, live dashboards

Telemetry batched, lossy, user can disable it

Failure surface

Cluster failover, regional outage

One device, one user — but a million at once

Deployment model

Hotfix in minutes; rollback stays server-side

App-store review, staged rollout, old binaries linger

What the interviewer is actually grading

Five rubric items. Announce them before you answer — candidates who name their own rubric score higher than candidates guessing what the interviewer wanted.

Data model. Entities and indexes before boxes. What fields live on Post, User, Reaction, and which are indexed?

Caching strategy. Memory versus disk, eviction policy, purge rules on logout. When does stale data get served, and when does it refuse?

Sync pattern. Pull-to-refresh, background polling, push-driven, hybrid. Know which one each problem calls for.

UI update cadence. Optimistic versus pessimistic writes. When the UI lies, how it reconciles, what the user sees in between.

Failure modes. Airplane mode. Race conditions. Process kill mid-upload. Name the failure before designing around it.

The research has watched candidate after candidate draw a flawless happy-path diagram and freeze when asked what happens if the network drops mid-write. That is the L5-versus-L6 line at every company running a real mobile SD round. They are grading whether you can name the trade-off you are choosing, out loud.

The five-layer framework for mobile SD answers

Feed, chat, photo upload, offline notes — all of them decompose to the same five layers. Memorize this, and you have a conversation structure that works across every prompt.

Layer 1: API contract. REST, GraphQL, or gRPC. Pagination shape matters — wins for mobile feeds because offset pagination breaks when new posts arrive mid-scroll. Delta endpoints shrink payload. Every response has a size budget.

Layer 2: Local storage. SQLite-based (Room on Android, Core Data on iOS) for anything relational or queryable. Key-value stores (UserDefaults, SharedPreferences) for flags only. The UI reads from storage, never from the network directly. Single source of truth.

Layer 3: Network layer. Request coalescing. Retry with exponential backoff. An offline write queue that survives app kill. Every request has a timeout. Every timeout has a fallback. Android Doze means the network layer assumes suspension, not reliability.

Layer 4: Cache strategy. Memory for hot data (LRU, bounded by image budget). Disk for warm data (TTL-based, purged on logout or disk pressure). Cache key discipline — if you cannot derive the cache key deterministically, your cache is broken. is the pattern worth naming.

Layer 5: UI update cadence. Optimistic writes for low-risk actions (like, follow). Pessimistic confirmation for high-risk (payment, post submit). Skeleton states for pending loads. A reconciliation strategy for when local state and server state disagree.

Diagram

Rendering diagram...

Every hop in that diagram can fail. Name which ones fail for the prompt, and you are already past most candidates. The strong answer is "here are five layers, and I will deep-dive on the one this problem pushes the hardest." Then actually do it.

Deep dive one — designing a mobile feed

The most common mobile SD prompt. Twitter, Instagram, LinkedIn, TikTok all start here. Apply the five-layer framework.

API contract. Cursor pagination with since_id and until_id; offset breaks when fresh posts arrive mid-scroll. A delta endpoint keeps payload small. LinkedIn's 2026 feed rebuild uses this pattern — their post on Engineering the next generation of LinkedIn's Feed describes "nearline pipelines" that "let each stage optimize independently for its own latency-throughput tradeoff while keeping the end-to-end system fresh within minutes." Fresh within minutes is an SLA. Say it out loud.

Local storage. Core Data on iOS, Room on Android. Index created_at and user_id. Normalize so a retweet does not duplicate the original post's content. Without normalization, a 10,000-post feed uses ten times the memory it needs to.

Network layer. Prefetch the next page when the user is five items from the bottom; cancel the prefetch if they scroll back up. Retry with exponential backoff — one, two, four, eight seconds, max three attempts. Past three, show a tap-to-retry affordance.

Cache strategy. Memory LRU for the current scroll window, roughly fifty posts. Disk for the last N pages and user profile images. TTL matters — feed pages expire in ten minutes, profile images live seven days. Purge on logout without exception.

Practice LRU Cache.

Explain your thinking like you're in the interview.

Try Two Sum free

UI update cadence. Optimistic likes — flip the heart immediately, reconcile on network response. Pessimistic post creation — disable submit, confirm on HTTP 201. Here is the trap almost no candidate names: when the server version of a post differs from the local version (edits, deletions, moderation), server wins — but you tell the user what happened. Silent overwrites erode trust.

Watch out

In production, this is the moment where reconciliation becomes a 3am page. "Users reporting their liked posts disappearing" is a reconciliation bug, not a bug. Design for the case where the server changes its mind.

The interviewer is listening for four things: reconciliation strategy, offline write queue, memory-bound sizing, delta-sync payload shape. Candidates who only sketch the happy path score L4. Candidates who name all four — without being asked — score L5.

Deep dive two — offline-first and sync

The second-most-common prompt. Notion, Bear, Google Docs, DoorDash driver app. Usually framed as "design a note-taking app that works on the subway."

The hard problem is conflict resolution. Three models, and each one has tradeoffs the interviewer wants you to name.

Last-writer-wins (LWW). Simplest. Timestamps decide. It fails when clocks drift — and phone clocks drift all the time. Acceptable for low-stakes data like draft notes or form state. Not acceptable for anything a user would notice losing.

— Conflict-free Replicated Data Types. Merge without conflict by construction. Figma, Linear, and Notion use them for collaborative documents. Heavy to implement. Not every app needs them. If the prompt is collaborative editing, name CRDTs out loud and you are in the top quartile.

Server-authoritative with . The client can write locally but the server is the source of truth. Conflicts surface as "server changed — reload?" prompts or silent server-side overwrites. This is what most apps actually ship, even when they claim to be "offline-first."

Sync triggers: hybrid wins. wakes the app for critical updates. runs on an OS-controlled schedule — every fifteen minutes on power, much less often on battery. Pull-to-refresh for user-initiated.

Android's Doze scheduler is the hard constraint. The Doze and App Standby documentation spells it out: setAndAllowWhileIdle() fires at most once every nine minutes, and JobScheduler tasks — including WorkManager — are deferred until maintenance windows. If your offline-first design assumes "we will just sync in the background," Android's OS has already cut that window down to scraps.

The queue must survive app kill. Android: WorkManager with chained OneTimeWorkRequest and unique work IDs. iOS: Core Data-backed queue plus a URLSession background configuration that survives suspension.

The interviewer listens for one last thing: when the same user edits on two devices offline, which version wins, and does the user ever find out? Most candidates hand-wave this. The strong answer names a policy, not a promise.

Deep dive three — real-time driver location

Third-most-common prompt. Uber, DoorDash, Instacart, Lyft all run a version of this round. Apply the framework with real-time constraints.

Connection model. WebSockets win for true bi-directional workloads like chat but cost battery and break through intermediaries. Server-Sent Events win for push-only streams like driver location. Long-polling is the fallback when corporate proxies strip WebSockets.

Throttling. Every location update costs battery. Server-side throttle: one update every three seconds when moving, every fifteen when parked. Client-side: emit only when position changed by more than ten meters. Both layers, both always on.

Back-pressure. Do not queue a backlog of location pings. The customer does not care where the driver was thirty seconds ago — only where they are now. Emit the latest, drop the rest. Same for typing indicators in chat.

Downgrade strategy. When real-time fails — airplane-mode gap, proxy drop, server restart — downgrade to polling on reconnect. Most candidates design for the happy WebSocket case and forget that every WebSocket connection dies eventually.

Discord published a complete case study on exactly this kind of decision. Their 2025 post on Supercharging Discord Mobile describes moving parts of their stack away from React Native, back to native, for perf-tight surfaces — server list, chat scrolling, Android emoji picker. They say it plainly: "Moving away from React Native meant maintaining separate codebases. However, the results justified the tradeoff." Fourteen percent memory reduction. Ten percent faster startup. Sixty percent fewer slow frames in chat. Sixty FPS animated emoji on budget Android.

Real-time is cheap when it works and catastrophic when it does not. A customer who sees the driver teleport two miles in one tick trusts the app less than a customer who sees it update every fifteen seconds. Smooth beats accurate.

Cross-cutting concerns every SD round expects

Four topics that come up across feed, offline, and real-time prompts. You need a sentence ready for each.

StrongYes tip

From Meta's 2025 Baseline Profiles post: "Slow startups, dropped frames and poor responsiveness are all key drivers of user frustration and, ultimately, attrition." That sentence is the job description. Everything else is commentary.

Cold start. First render under one second on a mid-tier device. Show cached content immediately, refresh in background. Android's App Startup Time docs call five seconds cold "excessive"; Meta's Baseline Profiles work yielded three-to-forty percent improvements on startup, scroll, and navigation across Facebook, Instagram, and Messenger. That is the upper bound of what install-time optimization buys.

Background state and battery. iOS gives about thirty seconds of background tail; Android varies by manufacturer and Doze state. Every GPS ping, every wake-up, every animation frame costs joules. Android Profiler and Xcode's Energy Impact gauge are the tools to name. Battery is a feature with no flag.

Network resilience. Airplane mode. Subway tunnel. Captive portal. Degrade gracefully — crashing on a timeout is a design failure, not a user error.

Companies and their angles

Each one runs the round differently. Know the angle for the company you are interviewing with.

LinkedIn. Feed pagination, profile , notification delivery. Their nearline-pipelines framing is the 2026 reference for feed architecture. They also ship three times a day, which changes their sync and rollout thinking.

Discord. React-Native-heavy mobile stack with native move-backs on perf-tight surfaces. Real-time messaging plus presence. Expect "design typing indicators" or "design the server list scroll."

Meta (Instagram, Facebook, Messenger). Performance-obsessed. Client-side ranking. Jetpack Compose migration at billion-user scale. Expect a follow-up on scroll perf, cold start, or memory under load.

Uber, DoorDash, Instacart, Lyft. All four run dedicated mobile SD rounds focused on real-time pipelines, offline-capable driver/shopper apps, and foreground-service lifecycle. Prep the real-time and offline deep dives above; they overlap heavily across these four companies.

Snap and Pinterest. Mobile-heavy engineering cultures with well-engineered client-side data layers. Snap leans camera and low-latency messaging. Pinterest leans feed ranking. Expect depth on the bucket they are known for.

The mistakes that kill mobile SD loops

Eight failure modes. Rehearse the opposite of each before your round.

Happy-path only. No failure-mode analysis.
"I would use Firebase." Offloads the design question instead of answering it.
No data model. Jumping to boxes without naming entities.
Ignoring backgrounding. Assuming the app is always alive.
Cache without eviction. "I would cache it" — no TTL, no memory bound.
Silent reconciliation. Overwriting user data without telling them.
No silent push. Missing the highest-leverage battery-safe sync primitive.
WebSocket with no downgrade. Every WebSocket connection dies eventually.

Every one is recoverable if you name it yourself before the interviewer asks. The candidate who says "I am ignoring backgrounding for now, let me come back to it" beats the candidate who never mentions it.

How to prep in four weeks

Week 1. Pick three apps on your phone — Uber, Instagram, DoorDash. Reverse-engineer their feed, sync, and offline behavior. Turn airplane mode on mid-use and watch what survives, what breaks. Write it down.

Week 2. Practice the five-layer framework on five prompts — feed, offline notes, chat, photo upload, real-time location. Thirty minutes each, whiteboard or Excalidraw. Say the five layers out loud every time.

Week 3. Three mock interviews, recorded. Self-critique the happy-path trap and the reconciliation blind spot. Watching yourself on video is uncomfortable but it is the fastest feedback loop you have.

Week 4. Company-specific drills. Uber and Lyft get real-time. DoorDash gets offline. Pinterest gets feed. Discord gets messaging throughput. LinkedIn gets feed plus notifications.

The interviewer wants to hear the trade-off you chose. Say it out loud. Name what you gave up. The silent candidate fails. The narrator who says "I am choosing eventual consistency here because users tolerate a five-second lag on likes" passes.