Back to Blog
WebSocketsScalabilityArchitectureReal-TimeBackend

Scaling WebSocket Infrastructure: Lessons from Handling 500K+ Concurrent Connections

5 min read
By Arman Hazrati

A deep dive into the architecture patterns, challenges, and solutions for scaling WebSocket infrastructure to handle hundreds of thousands of concurrent connections with sub-100ms latency.

Scaling WebSocket Infrastructure: Lessons from Handling 500K+ Concurrent Connections

Building real-time applications that scale to hundreds of thousands of concurrent WebSocket connections is one of the most challenging problems in modern web development. In this article, I'll share the lessons learned from architecting and scaling a WebSocket infrastructure that handles 500,000+ concurrent connections while maintaining sub-100ms latency.

The Challenge

Traditional HTTP request-response patterns don't work for real-time applications. WebSockets provide persistent connections, but scaling them presents unique challenges:

  • Connection Management: Each WebSocket connection consumes server resources
  • State Synchronization: Messages must be broadcast across multiple servers
  • Load Balancing: Sticky sessions are required for WebSocket connections
  • Memory Management: Long-lived connections can lead to memory leaks
  • Network Resilience: Connections can drop and need reconnection logic

Architecture Patterns

1. Horizontal Scaling with Redis Adapter

The key to scaling WebSockets is horizontal scaling. We can't rely on a single server—we need multiple WebSocket servers working together.

// Using Socket.io with Redis adapter
const io = require('socket.io')(server, {
  adapter: require('socket.io-redis')({
    host: process.env.REDIS_HOST,
    port: process.env.REDIS_PORT,
  }),
})

// Now messages broadcast across all servers
io.to('room:123').emit('message', data)

Key Benefits:

  • Messages automatically broadcast to all servers
  • Any server can handle any connection
  • Easy to add/remove servers dynamically

2. Connection Pooling and Resource Management

Each WebSocket connection consumes memory and file descriptors. We need to manage these resources carefully.

// Connection management
class ConnectionManager {
  constructor(maxConnections = 10000) {
    this.connections = new Map()
    this.maxConnections = maxConnections
  }
  
  addConnection(socketId, socket) {
    if (this.connections.size >= this.maxConnections) {
      this.evictOldestConnection()
    }
    
    this.connections.set(socketId, {
      socket,
      lastActivity: Date.now(),
    })
  }
  
  evictOldestConnection() {
    let oldest = null
    let oldestTime = Infinity
    
    for (const [id, conn] of this.connections) {
      if (conn.lastActivity < oldestTime) {
        oldest = id
        oldestTime = conn.lastActivity
      }
    }
    
    if (oldest) {
      this.connections.get(oldest).socket.disconnect()
      this.connections.delete(oldest)
    }
  }
}

3. Message Batching and Debouncing

Sending individual messages for every update is inefficient. Batching reduces network overhead.

// Message batching
class MessageBatcher {
  constructor(batchSize = 10, batchTimeout = 100) {
    this.batch = []
    this.batchSize = batchSize
    this.batchTimeout = batchTimeout
    this.timer = null
  }
  
  add(message) {
    this.batch.push(message)
    
    if (this.batch.length >= this.batchSize) {
      this.flush()
    } else if (!this.timer) {
      this.timer = setTimeout(() => this.flush(), this.batchTimeout)
    }
  }
  
  flush() {
    if (this.batch.length === 0) return
    
    const messages = [...this.batch]
    this.batch = []
    clearTimeout(this.timer)
    this.timer = null
    
    // Send batched messages
    io.emit('batch-update', messages)
  }
}

Load Balancing Strategies

Sticky Sessions

WebSocket connections are stateful, so we need sticky sessions to ensure a client always connects to the same server.

# NGINX configuration for sticky sessions
upstream websocket_backend {
    ip_hash;  # Sticky sessions based on IP
    server ws1.example.com:3000;
    server ws2.example.com:3000;
    server ws3.example.com:3000;
}

server {
    location /socket.io/ {
        proxy_pass http://websocket_backend;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}

Health Checks and Failover

Servers can fail, so we need health checks and automatic failover.

// Health check endpoint
app.get('/health', (req, res) => {
  const health = {
    status: 'healthy',
    connections: connectionManager.getConnectionCount(),
    memory: process.memoryUsage(),
    uptime: process.uptime(),
  }
  
  // Check if server is overloaded
  if (health.connections > MAX_CONNECTIONS * 0.9) {
    health.status = 'degraded'
  }
  
  res.json(health)
})

Performance Optimizations

1. Connection Heartbeats

Keep connections alive and detect dead connections quickly.

// Heartbeat mechanism
socket.on('ping', () => {
  socket.emit('pong')
})

// Detect dead connections
setInterval(() => {
  socket.emit('ping')
  
  socket.pingTimeout = setTimeout(() => {
    socket.disconnect()
  }, 5000)
}, 25000)

2. Compression

Compress messages to reduce bandwidth.

// Enable compression
const io = require('socket.io')(server, {
  perMessageDeflate: {
    zlibDeflateOptions: {
      chunkSize: 1024,
      memLevel: 7,
      level: 3,
    },
  },
})

3. Room-Based Broadcasting

Only send messages to relevant clients using rooms.

// Join room
socket.on('join-room', (roomId) => {
  socket.join(`room:${roomId}`)
})

// Broadcast to room only
io.to(`room:${roomId}`).emit('update', data)

Monitoring and Observability

Key Metrics to Track

  1. Connection Count: Total active connections per server
  2. Message Rate: Messages per second
  3. Latency: P95 and P99 latency for message delivery
  4. Error Rate: Failed connections and message delivery failures
  5. Memory Usage: Track for memory leaks
// Metrics collection
const metrics = {
  connections: 0,
  messagesSent: 0,
  messagesReceived: 0,
  errors: 0,
}

io.on('connection', (socket) => {
  metrics.connections++
  
  socket.on('message', (data) => {
    metrics.messagesReceived++
  })
  
  socket.on('disconnect', () => {
    metrics.connections--
  })
})

// Export metrics for Prometheus/Grafana
app.get('/metrics', (req, res) => {
  res.json(metrics)
})

Common Pitfalls and Solutions

1. Memory Leaks

Problem: Connections not properly cleaned up

Solution: Always remove event listeners and clear timers

socket.on('disconnect', () => {
  // Clean up
  clearTimeout(socket.pingTimeout)
  socket.removeAllListeners()
  connectionManager.remove(socket.id)
})

2. Message Ordering

Problem: Messages can arrive out of order in distributed systems

Solution: Use sequence numbers or timestamps

// Message with sequence number
const message = {
  sequence: getNextSequence(),
  data: payload,
  timestamp: Date.now(),
}

// Process in order
messageQueue.add(message)

3. Reconnection Storms

Problem: All clients reconnect simultaneously after outage

Solution: Exponential backoff with jitter

function reconnect(attempt = 0) {
  const delay = Math.min(1000 * Math.pow(2, attempt), 30000)
  const jitter = Math.random() * 1000
  
  setTimeout(() => {
    socket.connect()
  }, delay + jitter)
}

Lessons Learned

  1. Horizontal Scaling is Essential: You can't scale WebSockets vertically forever
  2. Redis is Your Friend: Redis Pub/Sub makes multi-server broadcasting trivial
  3. Monitor Everything: You can't optimize what you don't measure
  4. Plan for Failure: Servers will fail, connections will drop—plan accordingly
  5. Test at Scale: Load testing is crucial—don't wait for production issues

Conclusion

Scaling WebSocket infrastructure to handle hundreds of thousands of concurrent connections requires careful architecture, proper resource management, and continuous monitoring. The patterns and techniques discussed here have been battle-tested in production and can help you build scalable real-time applications.

Remember: start simple, measure everything, and iterate based on real-world performance data.


Further Reading: