Billions hide behind one play

2 min read

Cover image for Billions hide behind one play

The Intuition

Spotify feels simple:

  • press play
  • music starts instantly

But behind that button is a system handling:

  • hundreds of millions of users
  • billions of track requests per day
  • real-time personalization
  • global low-latency streaming

Spotify is

not a music appa massive distributed streaming system

if we were to talk about math

we want to estimate request volume, let's assume:

  • U=600,000,000U = 600,000,000 users (global scale estimate)
  • a=20a = 20 actions per user per day (play, skip, search, like, playlist updates)

so total daily events:

R=UaR = U \cdot a R=600,000,00020R = 600,000,000 \cdot 20 R=12,000,000,000R = 12,000,000,000

12 billion events per day

Now convert to per second load:

Rsec=12×10986400R_{sec} = \frac{12 \times 10^9}{86400} Rsec138,8881.4×105 requests/secR_{sec} \approx 138,888 \approx 1.4 \times 10^5 \text{ requests/sec}

So Spotify is constantly handling ~140K+ requests per second (baseline) — and much higher during peaks.


Why Simple Architecture Would Break

If a single backend handled all requests:

Let server capacity be:

μ=2,000 req/sec\mu = 2,000 \text{ req/sec}

Arrival rate:

λ=140,000\lambda = 140,000

Queue utilization:

ρ=λμ\rho = \frac{\lambda}{\mu} ρ=140,0002,000=70\rho = \frac{140,000}{2,000} = 70

System is 70× overloaded which is impossible for stability

So Spotify must distribute everything.


mechanism 1: CDN for Music Delivery

Instead of streaming from central servers:

Spotify uses CDNs (Content Delivery Networks).

Idea:

  • songs are cached near users
  • playback is served from edge nodes
  • backend is NOT involved in every play request

So effective load becomes:

λbackendλtotal\lambda_{backend} \ll \lambda_{total}

things like metadata + auth + recommendations hit backend.

This reduces load by orders of magnitude.

mechanism 2: Sharding User Data

User data is split across many machines.

If:

  • NN = 600M users
  • kk = 1,000 shards

Each holds a portion of users. handles:

Nk=600,000 users per shard\frac{N}{k} = 600,000 \text{ users per shard}

So instead of one database:

1 giant system becomes 1,000 smaller independent systems

This avoids:

  • single DB bottleneck
  • lock contention
  • global latency spikes

mechanism 3: Event Streaming Pipeline

Every action (play, skip, like) becomes an event.

Pipeline roughly behaves like:

  • client → event ingestion
  • Kafka-like queue
  • stream processing
  • recommendation updates

If event rate is:

E=140,000 events/secE = 140,000 \text{ events/sec}

And processing lag is:

L=Eprocessing capacityL = \frac{E}{processing\ capacity}

If processing capacity drops slightly, lag grows quickly → this is why streaming systems must be horizontally scalable.

mechanism 4: Caching Everything Possible

Spotify heavily relies on caching:

Let:

  • cache hit rate = hh

Then backend load:

Load=(1h)RLoad = (1 - h) \cdot R

If:

  • h=0.95h = 0.95

Then:

Load=0.0512B=600Mrequests/dayLoad = 0.05 \cdot 12B = 600M requests/day

20× reduction instantly

This is why:

  • playlists load fast
  • home feed feels instant
  • search is predictive

mechanism 5: Recommendation System Scaling

Recommendations are precomputed using batch + streaming:

Instead of computing live:

They approximate:

Score(user,track)=f(history,embeddings)Score(user, track) = f(history, embeddings)

This is precomputed in , not runtime.

So:

  • heavy ML runs offline
  • serving layer is lightweight

scaling identity

At scale, everything reduces to:

System Load=TrafficDistribution+Cache\text{System Load} = \frac{\text{Traffic}}{\text{Distribution} + \text{Cache}}

Spotify wins by maximizing:

  • distribution
  • caching
  • precomputation