Socket.IO Explained: Why It Exists, How It Works Internally, and How It Scales to Millions

# backend# javascript# networking# systemdesign
Socket.IO Explained: Why It Exists, How It Works Internally, and How It Scales to MillionsMunna Thakur

You've probably used or heard of Socket.IO. Most tutorials just show you socket.emit() and...

You've probably used or heard of Socket.IO. Most tutorials just show you socket.emit() and socket.on() and call it a day.

But if you actually want to understand what's happening under the hood — why Socket.IO exists, why it's more reliable than raw WebSockets, and how apps like WhatsApp handle millions of live connections — this post is for you.

Let's go through the whole thing from the beginning.


The Problem That Started All of This

Normal web apps work on HTTP. You send a request, server sends back a response, connection closes. That's it.

But some apps need the server to push data to the client at any time:

  • Chat apps
  • Live notifications
  • Live sports scores
  • Multiplayer games
  • Real-time dashboards

HTTP alone can't do this well. So developers got creative.


What People Did Before WebSockets

Before WebSockets became standard, real-time was painful. Here's what developers actually used:

HTTP Polling — Client asks the server every few seconds: "Any new data?"

setInterval(() => {
  fetch('/messages').then(res => res.json()).then(console.log)
}, 5000)
Enter fullscreen mode Exit fullscreen mode

Works, but wasteful. 10,000 users = 10,000 requests every 5 seconds hitting your server constantly.

Long Polling — Client sends a request, server holds it open until there's new data, then responds. Client immediately sends another request.

Client ──────→ request
Server ─────── waiting...
Server ──────→ here's your data
Client ──────→ new request
Enter fullscreen mode Exit fullscreen mode

Better than polling, but still HTTP overhead on every cycle. Gmail and early Facebook chat used this.

Server-Sent Events (SSE) — Server pushes data to client as a stream. One-way only though. Client can't send back easily.

Comet — A collection of hacks from the 2000s (hidden iframes, streaming, long polling combined). Gmail used it. It was messy.

Then in 2011, WebSocket was standardized in HTML5 and changed everything.


WebSocket: The Real Foundation

WebSocket gives you a persistent, two-way connection between browser and server.

Normal HTTP:
Client → request → Server → response → connection closed

WebSocket:
Client ←————————— persistent connection —————————→ Server
       ←— data anytime —→ ←— data anytime —→
Enter fullscreen mode Exit fullscreen mode

One connection stays open. Server can push data whenever it wants. Client can send whenever it wants. No repeated handshakes.

This is what powers WhatsApp Web, Slack, Discord, live trading platforms, and multiplayer games today.


So Why Does Socket.IO Exist?

Raw WebSocket is powerful, but it has rough edges:

  • Some corporate firewalls and proxies block WebSocket connections
  • Older browsers don't support it
  • If the connection drops, you have to handle reconnection yourself
  • There's no built-in concept of rooms, events, or broadcasting

Socket.IO is a library built on top of WebSocket that handles all of this for you.

Raw WebSocket:

const ws = new WebSocket('ws://localhost:3000')
ws.onmessage = (event) => console.log(event.data)
ws.send('Hello')
Enter fullscreen mode Exit fullscreen mode

Socket.IO:

import { io } from 'socket.io-client'
const socket = io('http://localhost:3000')

socket.on('newMessage', (data) => console.log(data))
socket.emit('newMessage', 'Hello')
Enter fullscreen mode Exit fullscreen mode

Socket.IO gives you named events, rooms, automatic reconnection, and fallback transport — all built in.


How Socket.IO Actually Works Internally

This is the part most developers skip. Socket.IO has an internal engine called Engine.IO, and connection happens in 4 phases.

Phase 1 — HTTP Handshake

When you call io('http://localhost:5000'), Socket.IO does NOT open a WebSocket immediately. It starts with a plain HTTP request:

GET /socket.io/?EIO=4&transport=polling
Enter fullscreen mode Exit fullscreen mode

Server responds with session info:

{
  "sid": "abc123",
  "upgrades": ["websocket"],
  "pingInterval": 25000,
  "pingTimeout": 20000
}
Enter fullscreen mode Exit fullscreen mode

A session ID is established. Server confirms WebSocket upgrade is possible.

Phase 2 — Long Polling as Fallback

Before upgrading to WebSocket, Engine.IO starts with HTTP long polling. This ensures the connection works even on networks that block WebSockets (corporate proxies, firewalls).

Client ──→ request
Server ─── hold...
Server ──→ sends data when available
Client ──→ immediately sends next request
Enter fullscreen mode Exit fullscreen mode

Phase 3 — Transport Upgrade

Once the polling works, Socket.IO tries to upgrade to WebSocket:

GET /socket.io/?EIO=4&transport=websocket&sid=abc123

Server responds: HTTP 101 Switching Protocols
Enter fullscreen mode Exit fullscreen mode

Polling stops. WebSocket takes over. You now have a persistent connection.

Phase 4 — Real-Time Communication

Now socket.emit() and socket.on() work over WebSocket frames.

// Client
socket.emit('newMessage', 'Hello everyone')

// Server
io.on('connection', (socket) => {
  socket.on('newMessage', (msg) => {
    io.emit('newMessage', msg) // broadcast to all
  })
})
Enter fullscreen mode Exit fullscreen mode

The full flow looks like this:

io() called
    │
HTTP Handshake (get session ID)
    │
Long Polling starts (fallback)
    │
WebSocket upgrade attempt
    │
WebSocket connected
    │
Real-time events ↔
Enter fullscreen mode Exit fullscreen mode

Heartbeat (Ping/Pong)

Socket.IO keeps the connection alive using a heartbeat:

Server ──→ ping
Client ──→ pong
Enter fullscreen mode Exit fullscreen mode

If pong doesn't come back within the timeout, connection is marked dead and auto-reconnect kicks in.


Socket.IO in React

Install:

npm install socket.io-client
Enter fullscreen mode Exit fullscreen mode

Basic usage:

import { useEffect, useState } from 'react'
import { io } from 'socket.io-client'

const socket = io('http://localhost:5000')

function Chat() {
  const [messages, setMessages] = useState([])

  useEffect(() => {
    socket.on('newMessage', (msg) => {
      setMessages((prev) => [...prev, msg])
    })

    return () => socket.off('newMessage')
  }, [])

  const sendMessage = () => {
    socket.emit('newMessage', 'Hello!')
  }

  return (
    <div>
      {messages.map((msg, i) => <p key={i}>{msg}</p>)}
      <button onClick={sendMessage}>Send</button>
    </div>
  )
}
Enter fullscreen mode Exit fullscreen mode

Rooms (Important for Group Chats)

Rooms let you target specific groups of users:

// Server
socket.join('cricket-room')
io.to('cricket-room').emit('scoreUpdate', { score: '45/2' })
Enter fullscreen mode Exit fullscreen mode

Only users in cricket-room receive that event. This is how group chats, channels, and game lobbies work.


The Scaling Problem (This is Where It Gets Interesting)

Socket.IO on a single server works great. But production apps have multiple servers behind a load balancer.

Here's the problem:

           Load Balancer
                │
      ┌─────────┼─────────┐
      │         │         │
   Server1   Server2   Server3

User A → Server1
User B → Server3
Enter fullscreen mode Exit fullscreen mode

User A sends a message to "cricket-room". Server1 knows who's in that room on its own connections. But it has no idea about User B sitting on Server3.

Result: User B never gets the message. ❌


The Fix: Redis Adapter

Redis acts as a message broker between all your Socket.IO servers.

           Load Balancer
                │
      ┌─────────┼─────────┐
      │         │         │
   Server1   Server2   Server3
      │         │         │
      └──────── Redis ────┘
Enter fullscreen mode Exit fullscreen mode

When Server1 emits an event, it publishes to Redis. Redis broadcasts it to all other servers. Each server delivers the message to its own connected clients.

Setup:

npm install @socket.io/redis-adapter redis
Enter fullscreen mode Exit fullscreen mode
import { createClient } from 'redis'
import { Server } from 'socket.io'
import { createAdapter } from '@socket.io/redis-adapter'

const io = new Server(3000)

const pubClient = createClient({ url: 'redis://localhost:6379' })
const subClient = pubClient.duplicate()

await Promise.all([pubClient.connect(), subClient.connect()])

io.adapter(createAdapter(pubClient, subClient))
Enter fullscreen mode Exit fullscreen mode

That's it. Now all your Socket.IO servers share room state through Redis.

Message flow with Redis:

User A → Server1
Server1 → publish to Redis
Redis → broadcast to Server2, Server3
Server2 → deliver to User B
Server3 → deliver to User C
Enter fullscreen mode Exit fullscreen mode

Everyone gets the message regardless of which server they're on.


How Apps Like WhatsApp Handle Millions of Connections

This is the big question. One server can handle maybe 50k–100k WebSocket connections. How do you get to millions?

1. Horizontal Scaling

Simple math: 100 servers × 50k connections = 5 million connections.

2. Load Balancer Routes Traffic

Users
  │
Load Balancer (Nginx / AWS ELB)
  │
WebSocket Server Pool
Enter fullscreen mode Exit fullscreen mode

3. Stateless Servers + Message Broker

Servers don't store state. They just receive and forward. Redis or Kafka sits in the middle.

User A → Server1 → Kafka → Server7 → User B
Enter fullscreen mode Exit fullscreen mode

4. Event Loop Servers (Node.js / Erlang / Go)

Traditional servers use one thread per connection. That doesn't scale.

Node.js uses a single-threaded event loop with non-blocking I/O. One process can handle tens of thousands of concurrent connections.

WhatsApp famously used Erlang — a language designed for millions of lightweight concurrent processes. It's literally built for this.

5. Edge / Regional Clusters

Users connect to the nearest data center. India users hit Asia servers. US users hit US servers. Latency drops, load distributes.

6. Idle Connection Optimization

Most WebSocket connections are idle most of the time. The server is just sending tiny ping/pong packets (10–20 bytes). Millions of idle connections are actually manageable if your server is event-driven.

Full production architecture:

Users
  │
Global Load Balancer
  │
Regional Load Balancers
  │
WebSocket Server Pool (stateless)
  │
Redis / Kafka (message bus)
  │
Database
Enter fullscreen mode Exit fullscreen mode

Quick Comparison: What to Use When

Scenario Use This
Simple real-time, reliability matters Socket.IO
You need full control, minimal overhead Raw WebSocket
One-way server → client stream (feeds, logs) Server-Sent Events
Multiple servers in production Socket.IO + Redis Adapter
Millions of connections, serious scale Kafka + stateless servers

Summary

Here's the full picture in one place:

  • Before WebSockets: Polling, long polling, Comet hacks — all workarounds
  • WebSocket: Real persistent two-way connection, changed everything in 2011
  • Socket.IO: Library on top of WebSocket with events, rooms, reconnection, and fallback transport
  • Internally: HTTP handshake → polling → upgrade → WebSocket → heartbeat
  • Scaling problem: Multiple servers can't share room state by default
  • Redis adapter: Acts as pub/sub bus between servers, solves the problem cleanly
  • Millions of connections: Horizontal scaling + stateless servers + message broker + edge infrastructure

If this helped, drop a reaction. If you've hit a Socket.IO issue in production that wasn't covered here, drop it in the comments.