Data Processing PatternsData Warehousing ArchitectureMedium⏱️ ~2 min

ETL vs ELT: Where to Transform Data

The choice between Extract Transform Load (ETL) and Extract Load Transform (ELT) determines where your data transformation compute happens and fundamentally impacts architecture, cost, and operational complexity. ETL transforms data before loading it into the warehouse using external engines like Apache Spark, Airflow, or dedicated ETL tools. ELT loads raw data first, then leverages the warehouse's own Massively Parallel Processing (MPP) engine to transform it using SQL. ETL made sense historically when warehouses were expensive, tightly coupled appliances with limited compute capacity. By offloading transformations to cheaper external clusters, organizations protected precious warehouse resources for analytical queries. ETL also centralizes business logic in a separate layer, making it portable across different warehouse platforms. This matters when governance requires transformations to happen outside the data warehouse for regulatory isolation or when the same logic must feed multiple downstream systems simultaneously. ELT has become dominant with modern cloud warehouses that separate storage and compute. Amazon Redshift, Google BigQuery, and Snowflake can elastically scale compute to handle both transformation and query workloads. Loading raw data into bronze tables is simple: just bulk copy from object storage at pennies per gigabyte. Then SQL transformations run directly on the MPP engine, which is specifically optimized for set based operations on columnar data. This eliminates the need to maintain separate Spark clusters and orchestrate data movement between systems. The trade off is clear. ETL keeps transformation load off the warehouse and centralizes logic, but increases operational complexity with more systems to manage and potential lock in to external engines. ELT simplifies architecture by using one system, leverages purpose built MPP performance, but can balloon warehouse costs if not carefully managed with workload isolation and query optimization. Many production systems use hybrid patterns: ELT for heavy set based aggregations and joins, ETL only where external compute is mandatory for privacy transformations or cross system replication.
💡 Key Takeaways
ETL offloads transformation compute to external engines (Spark, Airflow) protecting warehouse capacity but requiring orchestration across multiple systems and potential vendor lock in
ELT leverages the warehouse's Massively Parallel Processing (MPP) engine purpose built for set based operations on columnar data, simplifying architecture to a single platform
Google BigQuery and Amazon Redshift charge approximately 5 dollars per terabyte scanned, making poorly optimized ELT transforms that scan full tables repeatedly expensive at scale
Workload isolation is critical for ELT: separate compute pools or virtual warehouses for transformation jobs prevent them from starving interactive Business Intelligence (BI) queries
ETL is mandatory when privacy regulations require transformations (tokenization, masking) to happen outside the data warehouse or when the same logic must feed multiple target systems
Hybrid patterns are common in production: use ELT for heavy aggregations and joins, reserve ETL only for cross system orchestration or compliance required external processing
📌 Examples
E-commerce company using ELT: raw clickstream lands in BigQuery bronze tables, dbt SQL models transform to silver (sessionization, user stitching) and gold (conversion funnels) entirely within BigQuery MPP engine
Financial services using ETL: Spark job masks Social Security Numbers and tokenizes account numbers in external cluster before loading to Redshift, satisfying audit requirement that PII never enters warehouse in clear text
Media company hybrid: bulk aggregations (100 billion rows to 1 million summary rows) run as ELT SQL in Snowflake leveraging MPP, but cross platform sync to operational PostgreSQL database uses Airflow ETL orchestration
← Back to Data Warehousing Architecture Overview
ETL vs ELT: Where to Transform Data | Data Warehousing Architecture - System Overflow