OOP & Design PrinciplesInheritance & CompositionHard⏱️ ~3 min

Performance Trade-offs: Virtual Dispatch, Megamorphic Call Sites, and JIT Inlining

Deep inheritance hierarchies and excessive composition both risk megamorphic call sites, where a single virtual method or delegated call targets many different concrete implementations at runtime. Modern Just In Time (JIT) compilers inline monomorphic calls (single target) and sometimes bimorphic calls (two targets), but megamorphic sites (three or more targets) defeat inlining, forcing an indirect call through a vtable or interface dispatch. This adds CPU overhead and increases tail latency, often by 5 to 15 percent on hot paths processing millions of requests per second. Inheritance can enable devirtualization when hierarchies are shallow and call sites are monomorphic. If a base class has only one or two subclasses instantiated on a hot path, the JIT can speculate and inline the concrete method. However, as hierarchies deepen and more subclasses are introduced, call sites become polymorphic and then megamorphic. Small changes ripple across many consumers, and runtime dispatch at hot call sites hinders inlining, increasing tail latency. Production teams report p99 latency increases of 10 to 20 milliseconds on 100 to 200 millisecond endpoints after adding a third or fourth subclass to a previously bimorphic hierarchy. Composition introduces delegation layers that add dispatch overhead. Each composed component is a potential call site. If you swap strategy implementations frequently at runtime, the call site sees many different targets and becomes megamorphic. The solution is to stabilize hot call sites: prebind strategies per pool, per tenant, or per request type so that a given code path sees only one or two implementations. Limit decorator depth (less than or equal to 6 on hot Remote Procedure Call paths) and use coarse grained components to reduce pointer chasing and improve cache locality. In practice, keep inheritance depth less than or equal to 2 levels on critical paths and limit the number of live concrete subclasses. For composition, allocate time budgets per layer and monitor per layer latency. Unity Entity Component System (ECS) architectures demonstrate composition first design: over 1 million simple entities updating at 60 frames per second by using small, data oriented components that avoid deep virtual dispatch and enable parallel scheduling and cache friendly layout.
💡 Key Takeaways
Megamorphic call sites (3 or more concrete targets) defeat Just In Time inlining, adding 5 to 15 percent CPU overhead and 10 to 20 milliseconds p99 latency on 100 to 200 millisecond hot paths processing millions of Requests Per Second
Keep inheritance depth less than or equal to 2 on critical paths and limit live subclass count; adding a third or fourth subclass to a bimorphic hierarchy causes measurable tail latency regressions in production
Composition delegation layers add dispatch overhead; excessive decorator stacking (more than 6 layers) can add 5 to 20 milliseconds on 100 to 300 millisecond p99 endpoints
Stabilize hot call sites by prebinding strategies per pool or tenant so code paths see one or two implementations, enabling Just In Time compiler speculation and inlining
Unity Entity Component System achieves over 1 million entities at 60 frames per second (16.7 milliseconds per frame) via composition with data oriented, cache friendly components that avoid deep virtual dispatch
📌 Examples
A payment processing service added a third payment provider subclass to a previously bimorphic call site; p99 latency increased from 120 milliseconds to 140 milliseconds due to lost inlining on 5 million Transactions Per Second hot path
Composing 8 decorator layers (deadline, bulkhead, retry, circuit breaker, rate limit, auth, metrics, tracing) added 18 milliseconds overhead on a 250 millisecond p99 endpoint; reducing to 5 coarse layers cut overhead to 8 milliseconds
Google services prebind timeout and retry strategies per tenant in a pool of workers; each worker sees monomorphic call sites, enabling inlining and keeping per hop latency under 10 milliseconds at millions of Queries Per Second
← Back to Inheritance & Composition Overview