SAP DWC will soon incorporate Data Flow functionality, enabling the definition of more advanced ETL flows that complement existing data federation and replication services.
In the data flows we will be able to use a series of standard transformations without the need of programming knowledge and in a graphic way, but we also have the possibility of creating transformations based on scripts.
And what is the difference between Data Views and Data Flows? mainly that the first ones are oriented to create views that transform the data at the moment they are read without having persistence (although this will change in the future) obtaining a single output structure, while the second ones transform and persist the changes in one or multiple structures.
In a Data Flow we will use views or tables that we may already have in our DWC or use the connections to get data from other systems, in that case we should first create all the necessary connections in our space.
Creating Data Flows
In Data Builder, where we create the tables, views and E/Rs, we can now find a new “Data Flow” object, which has its own Data Flow Builder editor.
Here we will have access to the different tables and views of DWC or data sources that we have connected in our space. We will be able to add these sources or destinations to the Data Flow with drag and drop.
To these origins we will connect with standard transformations like:
With projections you can choose which fields to move to the next step, apply filters and create calculated fields with the help of 84 functions grouped in the categories of conversion, date, mathematics, etc.
This transformation allows the use of a scripting language to perform the required custom transformations. The first supported language is Python3 (more will be supported in the future) and allows the use of the famous Pandas and NumPy libraries, which will make it easier for us to apply some techniques of the data scientists to our data flows. Although we can’t use all of Python, for example we can’t import other libraries or perform I/O actions such as saving to file or to a DB, or use an http connection to download or send data.
Basically the data that enters into the transformation is a dataframe to which we can apply all the transformation possibilities that Pyhton3 (currently 3.6) and Pandas gives us, such as pivoting data, stacks, cross tabulations, etc.
An existing table has to be indicated as the destination of the transformations, so unlike SAP Data Services or HANA SDI, Data Flow in the beta does not have the option of creating the tables directly from the data flow, although they can be created manually in the Data Builder.
Later in the Data Flow you can add the table, select it as target and configure the data insertion mode (APPEND or TRUNCATE)
With that, the Data Flow has origins, transformations and destiny, so it can be executed.
Running Data Flows
When a Data Flow is executed, its status can be monitored in the Data Flow Monitor, within the Data Integrator Monitor.
It is possible to see the history of the executions, as well as more information if an error occurs.
In the beta it is not yet possible to program the Data Flows to be executed on a delayed basis, but this option will be available when the product is in GA.
The tested version is the Beta, and although it may change things in the final version it gives us an idea of how things will work, the look and the components we will be able to find.
Many things are missing to be a complete ETL, but we are talking about the first version, and I would dare to say that it brings the ETL world closer to business users to make some basic transformations.
I have been pleasantly surprised by the inclusion of Python as a scripting language, as this will allow complex transformations to be made.
I think that in the near future Data Flow will be able to help us in the SAP DWC project that we are currently developing, so we hope that it will soon reach all SAP DWC customers and that new functionalities will be added.
This post first appeared on Linkedin