What is Event Data Modeling?

The Core Idea:

Event data modeling represents what happens in your system as a stream of immutable records, rather than only storing the current state. Think of it like a bank ledger that shows every transaction versus just your current balance. Each event is a record of something that occurred at a specific time: user_signed_up, product_viewed, payment_authorized, driver_accepted_trip.

Every event captures a moment in time and never changes. You never overwrite an event. Instead, you add new ones. If an order is cancelled, you don't delete the order_placed event. You append an order_cancelled event. This gives you a complete behavioral history that can be replayed to reconstruct state, power analytics, and debug production issues.

The Universal Structure:

A well modeled event has several critical components. First, a globally unique event ID for deduplication and traceability. Second, an event time that represents when the action actually occurred. Third, an actor like user ID or device ID. Fourth, an object or target such as product ID or page URL. Fifth, context like experiment assignment, app version, location, or marketing campaign. Finally, a schema version so consumers know how to interpret the fields.

✓ In Practice: A consumer app with 10 million monthly active users generating 100 events per user per day produces roughly 1 billion events daily, or about 12,000 events per second on average, with peaks 5 to 10 times higher during busy hours.

This approach contrasts sharply with entity based modeling where you only store the latest state. In entity models, you might have a users table with a subscription_status column that gets updated. In event models, you have subscription_started, plan_changed, payment_failed, and subscription_cancelled events. The entity view becomes a derived state computed from events.

Why It Matters:

Event models shift complexity from writes to reads. Writes are simple and append only, requiring no coordination. Reads must piece together many events to answer questions. This tradeoff makes sense when you need detailed behavioral analytics, growth experimentation, fraud detection, or auditability. It's less useful when you only need current account balances or inventory counts.

💡 Key Takeaways

✓Events are immutable and append only. You never overwrite or delete events, you only add new ones to represent state changes.

✓At scale, a 10 million user app generates roughly 1 billion events per day (12,000 per second average, 60,000 to 120,000 per second at peak).

✓Each event requires a unique ID, precise timestamp, actor identifier, target object, context fields, and schema version to be useful downstream.

✓Event models shift complexity from writes (simple appends) to reads (must aggregate many events to answer questions about current state).

✓This approach excels for behavioral analytics, experimentation, and forensics but generates far more data than entity models that only track current state.

📌 Interview Tips

1A ride sharing app emits driver_accepted_trip, trip_started, location_updated (every 5 seconds), trip_completed, and payment_processed events rather than just updating a trips table.

2An ecommerce system tracks product_viewed, added_to_cart, cart_updated, checkout_initiated, payment_authorized, order_placed, and shipment_dispatched to understand the complete customer journey.

← Back to Event Data Modeling Overview