Beware, for this is a geeky blog article ! We’ll talk about the different types of workflows found in various data pipeline tools and in the modern data stack. Why ? Seeing the differences will help you understand when to use what type of tool.
Workflows or graphs are a visual representation of a set of “tasks” or “steps” that need to be executed to get some work done. Workflows are used in BPM solutions, in iPaaS platforms, in ETL tools and in orchestration tools.
Let’s talk about these different types of applications briefly:
BPM (Business Process Management)
Business Process Management tools are used to model and automate business processes, for example how an incoming invoice from a vendor is approved and paid. Workflows in BPM tools are typically long running because it can take days or weeks to complete the flow, and they often require manual intervention such as a person that can approve an invoice.
Examples: Camunda, Oracle BPM and many others.
iPaaS (Integration Platform as a Service)
iPaaS software (integration Platform as a Service), or in short “integration platforms”, are focussed on automating steps between SaaS business applications, for example making sure that new customers added in a CRM are also added in the accounting system. iPaaS workflows typically run in the background without human intervention and are often short running processes that take seconds, minutes or hours to complete.
Examples: Workato, Make.com (Integromat), Tray, Qlik Application Automation and many others.
ETL (Extract, Transform, Load)
ETL tools focus on building data flows from various sources into a central data warehouse, with the goal of performing analytics on the data using BI tools or ML/AI. Workflows in ETL tools are often referred to as “graphs” and they run in the background without human intervention.
These flows are often a visual representation of data transformations, they abstract away the complexity of for example writing SQL queries and some tools will help with the logical modelling of data. Modelling data is a complex process where raw data gets transformed into golden tables that can be consumed by business teams. One of the many challenges here is how to handle historical data and potentially using immutable data models. Read our other blog article on this topic!
Examples: Fivetran, Stitch.
Orchestration
Orchestration tools are used to automate processes between different systems in order to implement an end-to-end process. In the context of this article we’ll talk about orchestration tools for data pipelines.
Examples: Airflow, Dagster and others.
Control plane versus Data plane
Each of the above platforms use workflows but the way how these workflows are executed differs. First of all, we need to make a distinction between tools that operate on the Control plane versus tools that operate on the Data plane.
The Control plane is the level at which tasks are being orchestrated. Within the context of a data pipeline this means for example starting a task that will extract data from a database or a task that will transform data. However, workflows operating on the Control plane do not handle the actual data, they do not “see” the data. Instead they tell a specific system to do something.
The Data plane is the level at which the actual data is being handled.