Database DesignTime-Series Databases (InfluxDB, TimescaleDB)Easy⏱️ ~3 min

What Are Time Series Databases and How Do They Differ from Traditional Databases?

Time Series Databases (TSDBs) optimize specifically for data points indexed by time, like metrics, sensor readings, or application logs. Unlike traditional databases that balance reads and writes equally, TSDBs assume an append heavy workload where you write continuously but query specific time windows. The core architecture follows a universal pattern. Data flows through a write path that buffers incoming events in memory with a Write Ahead Log (WAL) for crash recovery, then persists them in time partitioned segments (hourly or daily chunks). When you query for metrics from the last 6 hours, the system reads only those specific segments rather than scanning everything. This partition pruning is the secret weapon that makes queries over billions of data points finish in milliseconds. Storage leverages columnar compression techniques tuned for time ordered sequences. Timestamps compress via delta of delta encoding (storing the difference between differences), floating point values use Gorilla style XOR compression, and low cardinality strings get dictionary encoding. Real world results show 8 to 59 times smaller storage versus naive row formats. A test with 100 devices writing one metric achieved 59x compression, while 4,000 devices with 10 metrics each compressed 8x. The read path emphasizes range scans over time windows, windowed aggregations like averages per minute, and downsampling where you keep full resolution data for days but roll up to minute level granularity for months. Memory, not disk, becomes the first bottleneck because in memory indices for high cardinality tag combinations can explode. Memory costs 100 to 1000 times more than disk per gigabyte, so production systems enforce strict limits on unique tag combinations and aggressively compress data at rest.
💡 Key Takeaways
Write path buffers data in memory with WAL for durability, then persists to time partitioned segments enabling partition pruning where queries read only relevant time windows
Columnar compression with delta of delta timestamps, XOR floats, and dictionary strings achieves 8 to 59x smaller storage compared to row formats depending on cardinality
Memory becomes the bottleneck before disk because in memory indices for unique tag combinations grow rapidly and memory costs 100 to 1000x more per gigabyte than disk
Lifecycle management is built in with retention policies that delete old data and continuous aggregates that downsample from second level to minute level granularity as data ages
Netflix Atlas ingests multi million samples per second with sub second query Service Level Objectives (SLOs) for dashboards by using push based client aggregation and aggressive rollups
📌 Examples
Datadog enforces strict per metric tag limits and top k rollups to prevent cardinality explosion that would cause out of memory failures across millions of metrics
TimescaleDB automatically partitions PostgreSQL tables into hypertable chunks by time and space, then uses constraint exclusion at query planning to skip irrelevant chunks
InfluxDB 3.x routes writes through ingesters that validate, partition by day, deduplicate, and persist Parquet columnar files achieving 10 to 100x compression from raw
← Back to Time-Series Databases (InfluxDB, TimescaleDB) Overview