Answering the question ‘what is data orchestration’ needs to be done in the context of data integration, and take into account the role of open source in transforming data.
Data integration is the process of combining and transforming data from multiple different sources and data domains to impact a business outcome. Common use cases for data integration include data warehousing, receiving data from partners or suppliers, creating a single customer view, and creating dashboards. Data integration normally includes cleansing, mapping, and either a simple or complex transformation of the data.
In contrast, data orchestration goes further than data integration. It combines data discovery, data preparation, data integration, data processing, and the connection of enriched data across complex landscapes. While data integration typically focuses on data in one place (often for the purposes of reporting), data orchestration focuses on processing data and combining it in a flexible manner for the purposes of enabling new or improved business processes.
The table below is from a recent O’Reilly report on “Managing Data Orchestration and Integration at Scale”. Notice the simple comparison across multiple characteristics such as data types, data stores, data processing, and processing patterns.
Data orchestration extends the use case of data integration, taking into account diverse data types, diverse processing engines, and complex landscapes.