Real Time Architecture: Connection Plane vs Data Plane Separation

Real time systems organize around two conceptually distinct planes: the connection plane and the data plane. The connection plane is responsible for terminating TLS, maintaining millions of live socket connections, tracking client identity and authentication, managing heartbeats to detect connection liveness, applying backpressure when clients fall behind, and routing messages to the correct backend node via consistent hashing or membership services. This plane is highly stateful, with each connection consuming memory for buffers, session metadata, and subscription tracking. A typical budget is 50 to 100 KB of process plus kernel memory per connection, meaning 1 million concurrent connections requires 50 to 100 GB of memory across the fleet.

The data plane handles the actual event distribution from producers to interested connections via a publish/subscribe fabric. This plane manages ordering guarantees, delivery semantics (at most once versus at least once with acknowledgments), replay windows for clients that temporarily disconnect, and snapshotting for efficient state synchronization. In production architectures, a gateway tier at the edge keeps connections and subscriptions while a message bus or distributed log transports events. Application services publish to this bus without knowing about individual connections, and a presence or metadata store associates users, sessions, topics, and sequence numbers for resumability.

This separation enables independent scaling of connection capacity versus event throughput. Salesforce's Streaming API demonstrates this pattern with a central event bus handling change data capture and platform events, where clients subscribe via streaming protocols with replay IDs for resumability, achieving sub second typical delivery latencies. Microsoft's Fluid Framework uses document scoped streams with a delta service backed by a log, where web clients maintain persistent connections to publish and consume operations, targeting under 200 ms end to end latency within a region while supporting hundreds of concurrent editors per document through batching, compression, and operation prioritization.

💡 Key Takeaways

✓Connection plane handles TLS termination, socket lifecycle, identity, heartbeats, and routing, consuming approximately 50 to 100 KB memory per connection

✓Data plane manages event fan out, ordering, delivery semantics (at most once vs at least once), replay windows, and snapshots independent of connection details

✓Separating planes allows independent scaling: add connection capacity without affecting event throughput, or scale message bus without resharding connections

✓Production systems use edge gateway tier for connections, message bus or log for transport, application services for publishing, and metadata store for session/sequence tracking

✓Salesforce achieves sub second delivery for platform events serving hundreds to thousands of clients per tenant at sustained rates of hundreds to tens of thousands events per second

✓Microsoft Fluid Framework maintains under 200 ms end to end latency for collaborative editing updates within a region, supporting hundreds of concurrent editors per document

📌 Interview Tips

1Discord shards clients by hash across gateway nodes to distribute load and isolate failures, with event fan out running at multi million events per second using a decoupled pub/sub layer that keeps control messages under 200 ms at 99th percentile

2Slack mediates message delivery fan out through a pub/sub layer and channel membership index, targeting sub 500 ms end to end delivery at 99th percentile by colocating WebSocket edges with users and minimizing cross region hops

← Back to WebSocket & Real-time Communication Overview