Design FundamentalsLatency vs ThroughputMedium⏱️ ~3 min

Real World Latency and Throughput Numbers Every Engineer Should Know

Understanding concrete latency and throughput numbers for common operations grounds system design discussions in reality. Memory access latencies form the foundation: L1 cache reference takes roughly 1 nanosecond, L2 cache reference is about 4 nanoseconds, and main memory reference is around 100 nanoseconds. These numbers matter for high frequency trading systems and performance critical hot paths where microseconds count. Storage access shows orders of magnitude variation: SSD random reads take 100 to 200 microseconds for NVMe drives, spinning disk random seeks take 5 to 10 milliseconds (50× to 100× slower), and sequential disk reads can sustain 100 to 500 MB/s for spinning disks or 1 to 7 GB/s for NVMe SSDs. Network latencies create fundamental constraints: same datacenter round trip is typically 0.5 to 2 milliseconds, cross availability zone within a region is 1 to 5 milliseconds, cross region within a continent (like US East to US West) is 50 to 80 milliseconds, and cross continent (US to Europe) is 80 to 150 milliseconds depending on routing. These physical constraints compound with protocol overheads and queuing delays. A TCP connection handshake requires one round trip (SYN, SYN+ACK, ACK) before data flows, so on a 60 ms RTT path you pay 60 ms just to establish the connection. TLS 1.2 with full handshake adds another 2 round trips (120 ms on the same path) before encrypted data can flow, making TLS 1.3 with 1 RTT handshake and 0 RTT resumption critically important for reducing startup latency. HTTP/2 and HTTP/3 (QUIC) reduce head of line blocking compared to HTTP/1.1 but introduce their own complexity: QUIC runs over UDP and implements its own reliability and congestion control, adding CPU overhead but avoiding TCP head of line blocking at the cost of roughly 5% to 10% more CPU usage. Throughput numbers help with capacity planning. A modern x86 server core can handle roughly 1 to 3 million packets per second with kernel bypass techniques like DPDK, but standard Linux networking tops out around 100,000 to 500,000 packets per second per core. A 10 Gbps network link can theoretically transfer 1.25 GB/s, but achieving that requires large packets (jumbo frames of 9 KB) and proper tuning; with standard 1500 byte MTU and protocol overhead, effective throughput is closer to 950 Mbps to 1.1 GB/s. Database transaction throughput varies widely: a well tuned MySQL or PostgreSQL instance on good hardware can sustain 10,000 to 50,000 transactions per second for simple queries on a single instance, while distributed systems like Cassandra or ScyllaDB can scale to millions of writes per second across a cluster by partitioning data and using LSM tree storage engines optimized for write throughput at the cost of read amplification.
💡 Key Takeaways
Memory hierarchy: L1 cache 1 nanosecond, L2 cache 4 nanoseconds, main memory 100 nanoseconds; SSD random read 100 to 200 microseconds, spinning disk seek 5 to 10 milliseconds (50× to 100× slower)
Network latency floors: same datacenter RTT 0.5 to 2 ms, cross AZ 1 to 5 ms, cross region (US East to West) 50 to 80 ms, cross continent (US to Europe) 80 to 150 ms
Protocol startup costs: TCP handshake costs 1 RTT (60 ms on 60 ms path), TLS 1.2 full handshake adds 2 RTTs (120 ms), TLS 1.3 reduces to 1 RTT and enables 0 RTT resumption
Network throughput: 10 Gbps link is 1.25 GB/s theoretical but 950 Mbps to 1.1 GB/s effective with standard 1500 byte MTU; achieving line rate requires jumbo frames (9 KB MTU) and tuning
Packet processing: modern server core handles 1 to 3 million packets/second with kernel bypass (DPDK) but only 100,000 to 500,000 packets/second per core with standard Linux networking
Database throughput: single MySQL or PostgreSQL instance sustains 10,000 to 50,000 TPS for simple queries; distributed Cassandra or ScyllaDB scales to millions of writes/second across cluster using LSM trees and partitioning
📌 Examples
TLS 1.2 vs 1.3 impact: on 60 ms RTT path, TLS 1.2 adds 120 ms startup latency (2 RTTs) before data flows; TLS 1.3 cuts to 60 ms (1 RTT) and enables 0 RTT resumption for repeat connections, critical for mobile and long distance connections
Storage tier selection: for workload requiring 10,000 random reads/second, NVMe SSD at 150 microseconds per read handles it easily on one drive; spinning disk at 8 ms per read would need 80 drives to achieve same throughput
Cross region write cost: synchronous quorum write across US East, US West, and Europe incurs 80 to 150 ms latency per write due to cross continent RTT; asynchronous replication cuts write latency to local region (1 to 5 ms) but risks data loss on failover
Compression decision point: on 100 Mbps link, 1 MB payload takes 80 ms to send; compression to 500 KB with 5 ms CPU yields 40 ms send + 5 ms CPU = 45 ms total (worthwhile); on 1 Gbps link, uncompressed is 8 ms vs compressed 4 ms send + 5 ms CPU = 9 ms (not worthwhile)
← Back to Latency vs Throughput Overview