Production Graph Database Implementation Patterns
Data Modeling Patterns
Make relationships explicit with direction and type. Keep properties on edges when used for filtering or ranking to avoid extra node fetches during traversals (one lookup instead of two per hop). For supernodes, split adjacency into layers: RECENT_FOLLOW (last 10,000 in memory) vs HISTORICAL_FOLLOW (cold storage). Precompute top-K neighborhoods offline using signals like recency or interaction frequency, storing as separate relationship types.
Query Shaping
Enforce hard limits on hop depth (3-4 max) and fan-out at each hop (100-1,000 before sampling). Apply degree-based pruning and time/recency filters early in traversal to minimize visited set. Use path uniqueness and cycle checks to prevent combinatorial blowups. Start from selective anchors: specific indexed nodes (user_id, product_id) rather than broad scans, because graph queries without a starting point degenerate into full scans.
Caching and Locality
Keep hot adjacency lists in memory using LFU (Least Frequently Used: evicts items accessed least often) or LRU (Least Recently Used: evicts oldest items) hybrids tuned for power-law access (most queries hit a small fraction of nodes repeatedly while the majority of nodes are rarely accessed). For distributed deployments, use per-shard caches to preserve locality rather than global caches which suffer from skew. Denormalize small aggregates like degree counts or last-N neighbors to avoid full adjacency scans.
Monitoring
Track: traversal depth and fan-out distributions, visited node counts per query, cache hit ratio on adjacency lists (target 95%+), replication lag (target <1 second), and tail latencies broken down by hop count.