What is a Time-Series Database (TSDB)?

A Time Series Database (TSDB) is a storage system optimized for data where each record is anchored by a timestamp and writes are almost always appends rather than updates. Unlike relational databases that optimize for complex joins and transactions, TSDBs prioritize high velocity sequential writes (often 100,000 to millions of samples per second) and efficient time window scans. The data model consists of measurement name, timestamp, fields (actual numeric values), and tags (dimensional metadata like server name or region).

The core insight is that time series data has predictable access patterns: you rarely update old data, you almost always query by time ranges, and you frequently aggregate across dimensions. For example, when Netflix ingests millions of metric updates per second for operational dashboards, queries typically ask for the last 5 minutes of data grouped by service name, not random point lookups across arbitrary time periods. This allows TSDBs to use specialized storage formats like columnar layouts with aggressive compression.

Production systems achieve order of magnitude space savings through time aware compression techniques. InfluxDB reports 10 to 100x compression when persisting to columnar object storage using delta of delta encoding for timestamps (storing differences between differences), XOR encoding for floating point values, and run length encoding for repeated dimension values. Facebook's Gorilla TSDB demonstrated similar compression ratios enabling full in memory retention of recent metrics with cross data center replication.

The fundamental tradeoff is sacrificing transactional guarantees and complex relational operations for write throughput and scan efficiency. Choose a TSDB when you need sustained high throughput ingest, low latency time window queries (sub second response for recent data), and built in retention policies with automatic downsampling. Avoid TSDBs when you need frequent row level updates, strict ACID transactions across records, or workloads dominated by complex multi table joins.

💡 Key Takeaways

•Data model: measurement + timestamp + fields (numeric values) + tags (dimensional metadata like region or host)

•Write pattern: append only with 100,000 to millions of samples per second sustained throughput versus frequent updates

•Compression: 10 to 100x space savings using delta of delta for timestamps, XOR for floats, run length for dimensions

•Query pattern: time window range scans and aggregations over recent data (last minutes to hours) not random point lookups

•Production scale: Netflix ingests millions of time series updates per second with dashboard queries returning in tens of milliseconds

•Core tradeoff: sacrifice ACID transactions and complex joins for high write throughput and efficient time based scans

📌 Examples

Facebook Gorilla TSDB: in memory monitoring system with delta of delta timestamps and XOR float compression achieving order of magnitude space reductions with cross data center replication for high availability

Prometheus: single node pull based TSDB ingesting 100,000 to 300,000 samples per second on commodity hardware with sub second queries over recent windows

IoT telemetry: connected car emitting 25 GB per hour requires aggressive compression, tiering to object storage, and downsampling to keep costs tractable

← Back to Time-Series Databases Overview