What is elt?

ELT stands for Extract, Load, Transform. It is a data integration method where you first pull (extract) data from its original source, then move (load) that raw data directly into a destination system such as a data warehouse or data lake, and finally reshape or clean (transform) the data inside that destination. The key difference from the older ETL approach is that the heavy‑lifting transformation happens after the data is already stored, usually using the processing power of the target system.

Let's break it down

  • Extract - Connect to the source (databases, APIs, files, etc.) and copy the needed data. This step is about getting the data out, not changing it.
  • Load - Send the extracted raw data into the target storage. In ELT the target is often a modern cloud warehouse (e.g., Snowflake, BigQuery, Redshift) that can hold massive amounts of unprocessed data.
  • Transform - Once the data lives in the warehouse, use SQL, scripting, or built‑in processing engines to clean, aggregate, join, or otherwise reshape the data so it’s ready for analysis.

Why does it matter?

  • Speed - Loading raw data is faster than trying to transform it on the way in.
  • Scalability - Cloud warehouses can automatically allocate more compute power for transformations, handling huge data volumes.
  • Flexibility - Analysts can re‑run or tweak transformations without re‑extracting data, enabling rapid experimentation.
  • Cost efficiency - You pay only for the compute you actually use for transformations, not for a separate ETL server.

Where is it used?

  • Cloud data warehouses (Snowflake, Amazon Redshift, Google BigQuery, Azure Synapse)
  • Data lakes that support SQL processing (Databricks Lakehouse, Amazon Athena)
  • Business intelligence pipelines where analysts need quick access to raw data for ad‑hoc queries
  • Real‑time or near‑real‑time analytics platforms that ingest streaming data and then transform it in‑place

Good things about it

  • Leverages the massive parallel processing power of modern warehouses.
  • Simplifies architecture: fewer moving parts because you don’t need a separate transformation engine.
  • Enables “single source of truth” by keeping raw data intact for future re‑use.
  • Supports agile development; data scientists can write transformations directly in SQL or notebooks.

Not-so-good things

  • Requires a powerful, often expensive, target system to handle heavy transformations.
  • Raw data stored in the warehouse can increase storage costs if not managed (e.g., archiving old data).
  • Security and governance become more complex because raw, possibly sensitive, data is retained in the destination.
  • Transformations may be slower for very large datasets if the warehouse’s compute resources are not properly sized or tuned.