Database DesignDocument Databases (MongoDB, Firestore)Easy⏱️ ~3 min

Document Database Core Model and Data Structure

Document databases store data as self-describing documents, typically in JSON-like format, grouped into collections. Each document is the atomic unit: reads, updates, and transactions operate on one or a small set of documents. Unlike relational databases where data spreads across normalized tables, document databases favor hierarchical nesting with objects and arrays inside a single document. This model maps naturally to application objects. A user profile might contain nested address objects, arrays of preferences, and embedded metadata all in one document. You can define indexes on nested fields (user.address.city) and array elements (user.tags[*]), with queries returning full documents or specific subsets. The schema is flexible: two documents in the same collection can have different fields without migration scripts. The tradeoff is query complexity. Multi-dimensional queries require precise compound index design. For example, filtering by country and product while sorting by date needs a compound index covering all three fields in the right order, or the database may fall back to scanning and sorting in memory. This flexibility comes at the cost of careful index planning for predictable performance. At companies like Airbnb and Uber, document databases store user profiles, booking records, and configuration data where the nested, denormalized structure reduces read latency by avoiding joins. A single document fetch returns everything needed to render a profile page, trading write amplification (updating duplicated data) for read speed.
💡 Key Takeaways
Documents are the atomic unit: single document operations are strongly consistent, cross-document operations may trade latency for consistency depending on the system
Flexible schema allows different fields per document in the same collection, enabling rapid iteration without migrations but requiring application-level validation
Indexes on nested fields and arrays enable complex queries, but poor index design forces collection scans with unpredictable latency scaling with data size
Denormalization is encouraged: embedding related data in one document reduces read latency by avoiding joins but increases write amplification when duplicated data must be updated
Query performance depends heavily on index coverage: a compound index matching equality filters plus sort fields keeps scans bounded, missing indexes trigger in-memory sorts
📌 Examples
User profile document: { "userId": "u789", "name": "Alice", "email": "[email protected]", "address": { "city": "Seattle", "zip": "98101" }, "preferences": ["email_notifications", "dark_mode"], "createdAt": "2024-01-15T10:30:00Z" }
Airbnb listing document embeds host details, amenities array, pricing tiers, and availability calendar in one document to serve listing page with single read
MongoDB query db.users.find({ "address.city": "Seattle", "preferences": "email_notifications" }).sort({ createdAt: -1 }) requires compound index on (address.city, preferences, createdAt) for optimal performance
← Back to Document Databases (MongoDB, Firestore) Overview
Document Database Core Model and Data Structure | Document Databases (MongoDB, Firestore) - System Overflow