Skip to Content
Product Information

SAP Data Warehouse Data Flows (Beta)

SAP DWC will soon incorporate Data Flow functionality, enabling the definition of more advanced ETL flows that complement existing data federation and replication services.

In the data flows we will be able to use a series of standard transformations without the need of programming knowledge and in a graphic way, but we also have the possibility of creating transformations based on scripts.

And what is the difference between Data Views and Data Flows? mainly that the first ones are oriented to create views that transform the data at the moment they are read without having persistence (although this will change in the future) obtaining a single output structure, while the second ones transform and persist the changes in one or multiple structures.

In a Data Flow we will use views or tables that we may already have in our DWC or use the connections to get data from other systems, in that case we should first create all the necessary connections in our space.

Create%20connection

Create connection

 

Creating Data Flows

In Data Builder, where we create the tables, views and E/Rs, we can now find a new “Data Flow” object, which has its own Data Flow Builder editor.

Create%20Data%20Flow

Create Data Flow

 

Here we will have access to the different tables and views of DWC or data sources that we have connected in our space. We will be able to add these sources or destinations to the Data Flow with drag and drop.

Sample%20Data%20Flow

Sample Data Flow

 

To these origins we will connect with standard transformations like:

  • Join
  • Union
  • Projection
  • Aggregation

With projections you can choose which fields to move to the next step, apply filters and create calculated fields with the help of 84 functions grouped in the categories of conversion, date, mathematics, etc.

Expressions

Expressions

 

Script Transformation

This transformation allows the use of a scripting language to perform the required custom transformations. The first supported language is Python3 (more will be supported in the future) and allows the use of the famous Pandas and NumPy libraries, which will make it easier for us to apply some techniques of the data scientists to our data flows. Although we can’t use all of Python, for example we can’t import other libraries or perform I/O actions such as saving to file or to a DB, or use an http connection to download or send data.

 

Script%20Transformation

Script Transformation

 

Basically the data that enters into the transformation is a dataframe to which we can apply all the transformation possibilities that Pyhton3 (currently 3.6) and Pandas gives us, such as pivoting data, stacks, cross tabulations, etc.

 

Pandas%20Pivot%20and%20Stack

Pandas Pivot

Pandas%20Stack

Pandas Stack

 

An existing table has to be indicated as the destination of the transformations, so unlike SAP Data Services or HANA SDI, Data Flow in the beta does not have the option of creating the tables directly from the data flow, although they can be created manually in the Data Builder.

 

Target%20Table

Target Table

 

Later in the Data Flow you can add the table, select it as target and configure the data insertion mode (APPEND or TRUNCATE)

Append

Append

 

With that, the Data Flow has origins, transformations and destiny, so it can be executed.

 

Running Data Flows

When a Data Flow is executed, its status can be monitored in the Data Flow Monitor, within the Data Integrator Monitor.

 

Data%20Flow%20Monitor

Data Flow Monitor

 

It is possible to see the history of the executions, as well as more information if an error occurs.

 

Monitor%20Errors

Monitor Errors

 

In the beta it is not yet possible to program the Data Flows to be executed on a delayed basis, but this option will be available when the product is in GA.

 

Conclusions

The tested version is the Beta, and although it may change things in the final version it gives us an idea of how things will work, the look and the components we will be able to find.

Many things are missing to be a complete ETL, but we are talking about the first version, and I would dare to say that it brings the ETL world closer to business users to make some basic transformations.

I have been pleasantly surprised by the inclusion of Python as a scripting language, as this will allow complex transformations to be made.

I think that in the near future Data Flow will be able to help us in the SAP DWC project that we are currently developing, so we hope that it will soon reach all SAP DWC customers and that new functionalities will be added.

 

This post first appeared on Linkedin

14 Comments
You must be Logged on to comment or reply to a post.
  • Really nice blog Xavi, you can edit the target table creation comment. 🙂 Now you can create them on the fly inside the dataflow. Step by step we will be adding more and more functionality so DWC becomes closer to a tool like Data Services.

  • With data flows, do you have the functionality of indicating whether a field in a table wants to be transferred or not and thus perform reduction of sending records to a data source? For example if an extractor data source has 150 columns but I really only need 20 which will reduce the volume of data to remote query or replication. In BW currently the data source has a check in the fields where one tells you whether or not to transfer information, I have not been able to check this functionality documented in the DWC issues or in the roadmap.

  • Hi Xavier

    I was doing a test exercise with the data flow, I can't find how to do the Update behavior in tables.
    For example I am loading a table of master data, when I send a double record by id I get an error and do not insert anything. so I have no way to do update behavior to the data.
    try the behavior of APPEND or TRUNCATE the same thing happens.

    I have also not found how to do a delete * from where some specific table.

    Thanks a lot.

    • If your data source has duplicates and the ID field is set to Key in DWC, it will give you an error regardless of whether or not you delete the target data, and the entire transaction will be reversed.

      If the duplicate occurs because there is already an ID in the target table, setting the TRUNCATE option will delete the data from the table before loading the new data so you should not get an error.

      Make sure it is not the first one.

       

      • Hi Xaver.

        Yes, as you say, I have an id in the table and I have transactional data in the source table. I am trying to obtain a master data based on a transactional table.

        If I remove the key from the target table, it automatically inserts records successfully, but for example, the same client inserts me 4 times.

        The behavior that I am trying to replicate is that of a traditional BW where it can indicate overwrite behavior.

        • What comes to mind is that you use a view as a source instead of the table, since in the view you can put a table and activate the DISTINCT option. Something that is not available when you add the table to a DF.

          • Hi Xavier.

            Ok I'm going to take this test and I'll tell you how it goes.

            Today they asked me a question about the surrogate keys? this is a concept in traditional databases that were trying to turn into a data warehouse. But here in DWC how is this concept understood? because in SAP BW he created them automatically and they are the SIDs.