Data Integration Patterns • Data Federation PatternsEasy⏱️ ~3 min
What is Data Federation?
Definition
Data Federation is a virtual data layer that allows you to query multiple physically separate data sources as if they were one unified database, without copying or moving the data first.
✓ In Practice: When an analyst writes SELECT * FROM customers JOIN orders, the federation engine might route the customers query to Salesforce via REST API and the orders query to PostgreSQL via SQL, then join the results locally before returning them.
The value proposition is simple: access fresh data from multiple sources without building and maintaining complex ETL pipelines. The freshness is real time because you always query the current state of each system.💡 Key Takeaways
✓Federation provides a virtual unified view over physically separate data sources without copying data
✓Queries are decomposed into subqueries that execute against each source system at runtime
✓Components include federation engine, source connectors, metadata catalog, query optimizer, and security layer
✓Data stays fresh because you always query the current state of each system, avoiding ETL latency
✓Trade off is operational simplicity for runtime dependencies on multiple upstream systems
📌 Examples
1Amazon Athena federated queries allow joining S3 data with RDS databases and SaaS systems through a single SQL interface
2Presto and Trino engines at Meta support connectors for HDFS, object storage, and operational databases for cross system analytics
3Google BigQuery Omni uses federation to query data across multiple clouds without data movement