Scaling WebSocket Infrastructure: Lessons from Handling 500K+ Concurrent Connections
A deep dive into the architecture patterns, challenges, and solutions for scaling WebSocket infrastructure to handle hundreds of thousands of concurrent connections with sub-100ms latency.
Scaling WebSocket Infrastructure: Lessons from Handling 500K+ Concurrent Connections
Building real-time applications that scale to hundreds of thousands of concurrent WebSocket connections is one of the most challenging problems in modern web development. In this article, I'll share the lessons learned from architecting and scaling a WebSocket infrastructure that handles 500,000+ concurrent connections while maintaining sub-100ms latency.
The Challenge
Traditional HTTP request-response patterns don't work for real-time applications. WebSockets provide persistent connections, but scaling them presents unique challenges:
- Connection Management: Each WebSocket connection consumes server resources
- State Synchronization: Messages must be broadcast across multiple servers
- Load Balancing: Sticky sessions are required for WebSocket connections
- Memory Management: Long-lived connections can lead to memory leaks
- Network Resilience: Connections can drop and need reconnection logic
Architecture Patterns
1. Horizontal Scaling with Redis Adapter
The key to scaling WebSockets is horizontal scaling. We can't rely on a single server—we need multiple WebSocket servers working together.
// Using Socket.io with Redis adapter
const io = require('socket.io')(server, {
adapter: require('socket.io-redis')({
host: process.env.REDIS_HOST,
port: process.env.REDIS_PORT,
}),
})
// Now messages broadcast across all servers
io.to('room:123').emit('message', data)
Key Benefits:
- Messages automatically broadcast to all servers
- Any server can handle any connection
- Easy to add/remove servers dynamically
2. Connection Pooling and Resource Management
Each WebSocket connection consumes memory and file descriptors. We need to manage these resources carefully.
// Connection management
class ConnectionManager {
constructor(maxConnections = 10000) {
this.connections = new Map()
this.maxConnections = maxConnections
}
addConnection(socketId, socket) {
if (this.connections.size >= this.maxConnections) {
this.evictOldestConnection()
}
this.connections.set(socketId, {
socket,
lastActivity: Date.now(),
})
}
evictOldestConnection() {
let oldest = null
let oldestTime = Infinity
for (const [id, conn] of this.connections) {
if (conn.lastActivity < oldestTime) {
oldest = id
oldestTime = conn.lastActivity
}
}
if (oldest) {
this.connections.get(oldest).socket.disconnect()
this.connections.delete(oldest)
}
}
}
3. Message Batching and Debouncing
Sending individual messages for every update is inefficient. Batching reduces network overhead.
// Message batching
class MessageBatcher {
constructor(batchSize = 10, batchTimeout = 100) {
this.batch = []
this.batchSize = batchSize
this.batchTimeout = batchTimeout
this.timer = null
}
add(message) {
this.batch.push(message)
if (this.batch.length >= this.batchSize) {
this.flush()
} else if (!this.timer) {
this.timer = setTimeout(() => this.flush(), this.batchTimeout)
}
}
flush() {
if (this.batch.length === 0) return
const messages = [...this.batch]
this.batch = []
clearTimeout(this.timer)
this.timer = null
// Send batched messages
io.emit('batch-update', messages)
}
}
Load Balancing Strategies
Sticky Sessions
WebSocket connections are stateful, so we need sticky sessions to ensure a client always connects to the same server.
# NGINX configuration for sticky sessions
upstream websocket_backend {
ip_hash; # Sticky sessions based on IP
server ws1.example.com:3000;
server ws2.example.com:3000;
server ws3.example.com:3000;
}
server {
location /socket.io/ {
proxy_pass http://websocket_backend;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}
Health Checks and Failover
Servers can fail, so we need health checks and automatic failover.
// Health check endpoint
app.get('/health', (req, res) => {
const health = {
status: 'healthy',
connections: connectionManager.getConnectionCount(),
memory: process.memoryUsage(),
uptime: process.uptime(),
}
// Check if server is overloaded
if (health.connections > MAX_CONNECTIONS * 0.9) {
health.status = 'degraded'
}
res.json(health)
})
Performance Optimizations
1. Connection Heartbeats
Keep connections alive and detect dead connections quickly.
// Heartbeat mechanism
socket.on('ping', () => {
socket.emit('pong')
})
// Detect dead connections
setInterval(() => {
socket.emit('ping')
socket.pingTimeout = setTimeout(() => {
socket.disconnect()
}, 5000)
}, 25000)
2. Compression
Compress messages to reduce bandwidth.
// Enable compression
const io = require('socket.io')(server, {
perMessageDeflate: {
zlibDeflateOptions: {
chunkSize: 1024,
memLevel: 7,
level: 3,
},
},
})
3. Room-Based Broadcasting
Only send messages to relevant clients using rooms.
// Join room
socket.on('join-room', (roomId) => {
socket.join(`room:${roomId}`)
})
// Broadcast to room only
io.to(`room:${roomId}`).emit('update', data)
Monitoring and Observability
Key Metrics to Track
- Connection Count: Total active connections per server
- Message Rate: Messages per second
- Latency: P95 and P99 latency for message delivery
- Error Rate: Failed connections and message delivery failures
- Memory Usage: Track for memory leaks
// Metrics collection
const metrics = {
connections: 0,
messagesSent: 0,
messagesReceived: 0,
errors: 0,
}
io.on('connection', (socket) => {
metrics.connections++
socket.on('message', (data) => {
metrics.messagesReceived++
})
socket.on('disconnect', () => {
metrics.connections--
})
})
// Export metrics for Prometheus/Grafana
app.get('/metrics', (req, res) => {
res.json(metrics)
})
Common Pitfalls and Solutions
1. Memory Leaks
Problem: Connections not properly cleaned up
Solution: Always remove event listeners and clear timers
socket.on('disconnect', () => {
// Clean up
clearTimeout(socket.pingTimeout)
socket.removeAllListeners()
connectionManager.remove(socket.id)
})
2. Message Ordering
Problem: Messages can arrive out of order in distributed systems
Solution: Use sequence numbers or timestamps
// Message with sequence number
const message = {
sequence: getNextSequence(),
data: payload,
timestamp: Date.now(),
}
// Process in order
messageQueue.add(message)
3. Reconnection Storms
Problem: All clients reconnect simultaneously after outage
Solution: Exponential backoff with jitter
function reconnect(attempt = 0) {
const delay = Math.min(1000 * Math.pow(2, attempt), 30000)
const jitter = Math.random() * 1000
setTimeout(() => {
socket.connect()
}, delay + jitter)
}
Lessons Learned
- Horizontal Scaling is Essential: You can't scale WebSockets vertically forever
- Redis is Your Friend: Redis Pub/Sub makes multi-server broadcasting trivial
- Monitor Everything: You can't optimize what you don't measure
- Plan for Failure: Servers will fail, connections will drop—plan accordingly
- Test at Scale: Load testing is crucial—don't wait for production issues
Conclusion
Scaling WebSocket infrastructure to handle hundreds of thousands of concurrent connections requires careful architecture, proper resource management, and continuous monitoring. The patterns and techniques discussed here have been battle-tested in production and can help you build scalable real-time applications.
Remember: start simple, measure everything, and iterate based on real-world performance data.
Further Reading: