Billions hide behind one play
2 min read

The Intuition
Spotify feels simple:
- press play
- music starts instantly
But behind that button is a system handling:
- hundreds of millions of users
- billions of track requests per day
- real-time personalization
- global low-latency streaming
Spotify is
if we were to talk about math
we want to estimate request volume, let's assume:
- users (global scale estimate)
- actions per user per day (play, skip, search, like, playlist updates)
so total daily events:
12 billion events per day
Now convert to per second load:
So Spotify is constantly handling ~140K+ requests per second (baseline) — and much higher during peaks.
Why Simple Architecture Would Break
If a single backend handled all requests:
Let server capacity be:
Arrival rate:
Queue utilization:
System is 70× overloaded which is impossible for stability
So Spotify must distribute everything.
mechanism 1: CDN for Music Delivery
Instead of streaming from central servers:
Spotify uses CDNs (Content Delivery Networks).
Idea:
- songs are cached near users
- playback is served from edge nodes
- backend is NOT involved in every play request
So effective load becomes:
things like metadata + auth + recommendations hit backend.
This reduces load by orders of magnitude.
mechanism 2: Sharding User Data
User data is split across many machines.
If:
- = 600M users
- = 1,000 shards
Each holds a portion of users. handles:
So instead of one database:
1 giant system becomes 1,000 smaller independent systems
This avoids:
- single DB bottleneck
- lock contention
- global latency spikes
mechanism 3: Event Streaming Pipeline
Every action (play, skip, like) becomes an event.
Pipeline roughly behaves like:
- client → event ingestion
- Kafka-like queue
- stream processing
- recommendation updates
If event rate is:
And processing lag is:
If processing capacity drops slightly, lag grows quickly → this is why streaming systems must be horizontally scalable.
mechanism 4: Caching Everything Possible
Spotify heavily relies on caching:
Let:
- cache hit rate =
Then backend load:
If:
Then:
20× reduction instantly
This is why:
- playlists load fast
- home feed feels instant
- search is predictive
mechanism 5: Recommendation System Scaling
Recommendations are precomputed using batch + streaming:
Instead of computing live:
They approximate:
This is precomputed in , not runtime.
So:
- heavy ML runs offline
- serving layer is lightweight
scaling identity
At scale, everything reduces to:
Spotify wins by maximizing:
- distribution
- caching
- precomputation