ETL vs ELT vs Streaming ETL


Exploring batch and real-time design paradigms for data processing

Photo by Compare Fibre on Unsplash

Extract, Transform, Load (ETL) and Extract, Load, Transform (ELT) are two fundamental concepts in the context of data processing, used to describe data ingestion and transformation design paradigms. While these terms are often used interchangeably, they refer to slightly different concepts and are applicable to different use cases that also impose varying designs.

In this article, we will explore the differences and similarities of both ETL and ELT and discuss how the landscape in cloud computing and data engineering has affected data processing design patterns. Furthermore, we will outline the main advantages and disadvantages both have to offer in modern data teams. Lastly, we will discuss Streaming ETL, an emerging data-processing pattern that aims to solve various disadvantages of more traditional batch approaches.

Ingesting and persisting data from external sources into a destination system involves three distinct steps.

Extract
The ‘Extract’ step involves all processes required in order to pull data from a source system. Such sources include an Application Programming Interface (API), a database system or a file, and Internet of Things (IoT) devices while the data can be in any form; structured, semi-structured or unstructured. Data pulled during this step are usually referred to as ‘raw data’.

Transform
During the ‘Transform’ step, the pipeline applies transformations on top of the raw data in order to achieve a certain goal. This goal is usually related to business or technical requirements. Some commonly applied transformations include data modification (e.g. mapping United States to US), record or attribute selection, joins into other data sources or even data validations.

Applying transformation on raw data to achieve a certain goal as part of the ‘Transform’ step in ETL/ELT pipelines — Source: Author

Load
During the ‘load’ step, the data (either raw or transformed) are loaded into a destination system. Usually, the destination is an OLAP system (i.e. a Data Warehouse or…



Source link

Leave a Comment