Retention Policies and Data Lifecycle: Balancing Cost, Compliance, and Query Needs

Why Retention Matters:

Time series data grows linearly with time and number of series. If you collect 1 billion points per day and keep them forever at 2 bytes per compressed point, you accumulate 730 gigabytes per year, 7.3 terabytes per decade. For a fleet of 10,000 servers with 1,000 metrics each at 10 second resolution, you face 8.64 billion points per day, or 6.3 petabytes per decade even with 10 times compression. Unbounded retention is financially and operationally impractical. Retention policies define how long data lives at each resolution, balancing cost, regulatory requirements, and query needs.

Designing Retention Policies:

Retention should match access patterns. Recent data for dashboards, alerting, and troubleshooting needs high resolution and fast access. Older data for capacity planning, compliance, and trend analysis can tolerate lower resolution and slower queries. A common pattern is: raw 10 second data for 7 to 14 days, 1 minute rollups for 90 days, hourly rollups for 1 year, daily rollups for 3 to 7 years. These numbers align with typical use cases: debugging requires recent detailed data, capacity planning uses monthly trends, compliance audits need years of daily aggregates.

Retention also interacts with storage cost. Keeping 14 days of raw data on SSD costs roughly 10 times more per gigabyte than keeping 1 year of compressed rollups on object storage. If raw data is 100 gigabytes per day on SSD at $0.10 per gigabyte per month, 14 days costs $140 per month. Downsampling to 1 minute reduces to 1.67 gigabytes per day; 90 days on cheaper storage at $0.02 per gigabyte per month costs $3 per month. The trade off is query granularity versus cost.

✓ In Practice: Netflix keeps 7 days of raw metrics for debugging, 90 days of 1 minute rollups for trend analysis, and several years of daily aggregates for capacity planning and cost attribution. This multi tier retention reduces storage cost by approximately 95 percent compared to keeping all data at raw resolution.
Compliance and Legal Requirements:

Regulatory frameworks like General Data Protection Regulation (GDPR), Health Insurance Portability and Accountability Act (HIPAA), or financial auditing standards often mandate minimum retention periods for certain data. Financial transaction metrics may need 7 years of retention. Healthcare telemetry may require 10 years. Data modeling must accommodate these requirements while avoiding overretention, which increases storage cost and privacy risk. A best practice is to separate regulated metrics into dedicated retention tiers with automated deletion after the legal minimum, ensuring compliance without keeping unnecessary data.

Automated Lifecycle Management:

Manual deletion is error prone and operationally expensive. Production systems automate lifecycle management using time based partitioning. Once a partition's time window closes and its retention period expires, the system automatically deletes or archives it. For example, partitions for raw data older than 14 days are deleted, partitions for 1 minute rollups older than 90 days are deleted, and so on. This automation runs continuously, keeping storage usage stable as new data arrives and old data expires.

Cloud storage services like Amazon S3 or Google Cloud Storage offer lifecycle policies that automatically transition data to cheaper cold storage tiers or delete it after a specified age. A time series system can write daily rollups to object storage with a lifecycle policy that moves data older than 1 year to Glacier (very cheap, retrieval takes hours) and deletes data older than 7 years. This offloads lifecycle management to the storage layer.

The Trade Off:

Aggressive retention policies reduce cost but limit historical analysis. If you delete raw data after 7 days and later discover a bug that requires detailed investigation of an incident from 10 days ago, the data is gone. Conversely, keeping everything forever is expensive and complicates privacy compliance. The solution is to document retention policies clearly, ensure stakeholders understand the trade offs, and design systems to answer most questions with downsampled data while accepting that some deep dives into distant history are impossible.

💡 Key Takeaways

✓Unbounded retention at raw resolution for 10,000 servers with 1,000 metrics generates 8.64 billion points per day, accumulating 6.3 petabytes over 10 years even with 10x compression, making cost prohibitive.

✓Typical retention: raw 10 second data for 7 to 14 days (debugging), 1 minute rollups for 90 days (trends), hourly for 1 year, daily for 3 to 7 years (capacity planning, compliance).

✓Storage cost scales with resolution and tier: raw SSD at $0.10 per gigabyte per month (14 days, 100 gigabytes per day = $140/month) versus 1 minute rollups on object storage at $0.02 per gigabyte per month (90 days, 1.67 gigabytes per day = $3/month).

✓Compliance frameworks (GDPR, HIPAA, financial auditing) may mandate 7 to 10 years retention for specific metrics, requiring dedicated tiers with automated deletion after legal minimums.

✓Automated lifecycle using time partitioned storage deletes expired partitions continuously; cloud lifecycle policies transition data older than 1 year to cold storage (Glacier) and delete after 7 years, offloading management.

📌 Interview Tips

1Netflix retains 7 days raw, 90 days 1 minute rollups, years of daily aggregates, reducing storage cost by approximately 95 percent versus raw retention while supporting debugging, trend analysis, and capacity planning.

2A financial services company keeps transaction metrics for 7 years per regulatory requirements, using daily rollups on cheap object storage with automated lifecycle deletion after 2555 days to ensure compliance.

3A system writes daily rollups to Amazon S3 with lifecycle policy: transition to Glacier after 365 days (retrieval latency increases from milliseconds to hours), delete after 2555 days (7 years).

← Back to Time-Series Data Modeling Overview