Skip to Content
Product Information
Author's profile photo Xavier Polo

SAP Data Warehouse Data Flows (Beta)

SAP DWC will soon incorporate Data Flow functionality, enabling the definition of more advanced ETL flows that complement existing data federation and replication services.

In the data flows we will be able to use a series of standard transformations without the need of programming knowledge and in a graphic way, but we also have the possibility of creating transformations based on scripts.

And what is the difference between Data Views and Data Flows? mainly that the first ones are oriented to create views that transform the data at the moment they are read without having persistence (although this will change in the future) obtaining a single output structure, while the second ones transform and persist the changes in one or multiple structures.

In a Data Flow we will use views or tables that we may already have in our DWC or use the connections to get data from other systems, in that case we should first create all the necessary connections in our space.

Create%20connection

Create connection

 

Creating Data Flows

In Data Builder, where we create the tables, views and E/Rs, we can now find a new “Data Flow” object, which has its own Data Flow Builder editor.

Create%20Data%20Flow

Create Data Flow

 

Here we will have access to the different tables and views of DWC or data sources that we have connected in our space. We will be able to add these sources or destinations to the Data Flow with drag and drop.

Sample%20Data%20Flow

Sample Data Flow

 

To these origins we will connect with standard transformations like:

  • Join
  • Union
  • Projection
  • Aggregation

With projections you can choose which fields to move to the next step, apply filters and create calculated fields with the help of 84 functions grouped in the categories of conversion, date, mathematics, etc.

Expressions

Expressions

 

Script Transformation

This transformation allows the use of a scripting language to perform the required custom transformations. The first supported language is Python3 (more will be supported in the future) and allows the use of the famous Pandas and NumPy libraries, which will make it easier for us to apply some techniques of the data scientists to our data flows. Although we can’t use all of Python, for example we can’t import other libraries or perform I/O actions such as saving to file or to a DB, or use an http connection to download or send data.

 

Script%20Transformation

Script Transformation

 

Basically the data that enters into the transformation is a dataframe to which we can apply all the transformation possibilities that Pyhton3 (currently 3.6) and Pandas gives us, such as pivoting data, stacks, cross tabulations, etc.

 

Pandas%20Pivot%20and%20Stack

Pandas Pivot

Pandas%20Stack

Pandas Stack

 

An existing table has to be indicated as the destination of the transformations, so unlike SAP Data Services or HANA SDI, Data Flow in the beta does not have the option of creating the tables directly from the data flow, although they can be created manually in the Data Builder.

 

Target%20Table

Target Table

 

Later in the Data Flow you can add the table, select it as target and configure the data insertion mode (APPEND or TRUNCATE)

Append

Append

 

With that, the Data Flow has origins, transformations and destiny, so it can be executed.

 

Running Data Flows

When a Data Flow is executed, its status can be monitored in the Data Flow Monitor, within the Data Integrator Monitor.

 

Data%20Flow%20Monitor

Data Flow Monitor

 

It is possible to see the history of the executions, as well as more information if an error occurs.

 

Monitor%20Errors

Monitor Errors

 

In the beta it is not yet possible to program the Data Flows to be executed on a delayed basis, but this option will be available when the product is in GA.

 

Conclusions

The tested version is the Beta, and although it may change things in the final version it gives us an idea of how things will work, the look and the components we will be able to find.

Many things are missing to be a complete ETL, but we are talking about the first version, and I would dare to say that it brings the ETL world closer to business users to make some basic transformations.

I have been pleasantly surprised by the inclusion of Python as a scripting language, as this will allow complex transformations to be made.

I think that in the near future Data Flow will be able to help us in the SAP DWC project that we are currently developing, so we hope that it will soon reach all SAP DWC customers and that new functionalities will be added.

 

This post first appeared on Linkedin

Assigned Tags

      14 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Enio Terra
      Enio Terra

      Hi Xavier, excellent content! Thanks for sharing. Are you using the partner tenant?

      BR

      Enio Terra

      Author's profile photo Xavier Polo
      Xavier Polo
      Blog Post Author

      No, this is a specific beta tenant to test Data flow functionality for a DWC project in one of our customers.

      Author's profile photo Juan Carlos Lazaro
      Juan Carlos Lazaro

      Really nice blog Xavi, you can edit the target table creation comment. 🙂 Now you can create them on the fly inside the dataflow. Step by step we will be adding more and more functionality so DWC becomes closer to a tool like Data Services.

      Author's profile photo Karen Avendano
      Karen Avendano

      With data flows, do you have the functionality of indicating whether a field in a table wants to be transferred or not and thus perform reduction of sending records to a data source? For example if an extractor data source has 150 columns but I really only need 20 which will reduce the volume of data to remote query or replication. In BW currently the data source has a check in the fields where one tells you whether or not to transfer information, I have not been able to check this functionality documented in the DWC issues or in the roadmap.

      Author's profile photo Xavier Polo
      Xavier Polo
      Blog Post Author

      In DF there is a component named “Projection” that allows to exclude fields from source, and select only those that you need.

      Basically it converts the “select *” into “select <selected fields>”

      Author's profile photo Karen Avendano
      Karen Avendano

      HI Xavier Polo, thank you very much for your clarification.

      Can I also ask you a question about modeling and data source changes?

      Author's profile photo Xavier Polo
      Xavier Polo
      Blog Post Author

      Of course, but I recommend that you fill out the question form, this way you will have more visibility, and others will be able to respond, and perhaps the question and its answers will help others.

      Ask a Question - SAP Q&A

      You can always mention with @

      Author's profile photo Karen Avendano
      Karen Avendano

      ok, I'm going to write the question as you indicate and I'll mention it to you.

      Author's profile photo Karen Avendano
      Karen Avendano

      Hi Xavier, I asked the question in the community but I can't find you to mention you.
      This is is link

      https://answers.sap.com/questions/13199276/opcion-replace-table-and-datasource-on-views-of-dw.html

      Thank you so much.

      Author's profile photo Karen Avendano
      Karen Avendano

      Hi Xavier

      I was doing a test exercise with the data flow, I can't find how to do the Update behavior in tables.
      For example I am loading a table of master data, when I send a double record by id I get an error and do not insert anything. so I have no way to do update behavior to the data.
      try the behavior of APPEND or TRUNCATE the same thing happens.

      I have also not found how to do a delete * from where some specific table.

      Thanks a lot.

      Author's profile photo Xavier Polo
      Xavier Polo
      Blog Post Author

      If your data source has duplicates and the ID field is set to Key in DWC, it will give you an error regardless of whether or not you delete the target data, and the entire transaction will be reversed.

      If the duplicate occurs because there is already an ID in the target table, setting the TRUNCATE option will delete the data from the table before loading the new data so you should not get an error.

      Make sure it is not the first one.

       

      Author's profile photo Karen Avendano
      Karen Avendano

      Hi Xaver.

      Yes, as you say, I have an id in the table and I have transactional data in the source table. I am trying to obtain a master data based on a transactional table.

      If I remove the key from the target table, it automatically inserts records successfully, but for example, the same client inserts me 4 times.

      The behavior that I am trying to replicate is that of a traditional BW where it can indicate overwrite behavior.

      Author's profile photo Xavier Polo
      Xavier Polo
      Blog Post Author

      What comes to mind is that you use a view as a source instead of the table, since in the view you can put a table and activate the DISTINCT option. Something that is not available when you add the table to a DF.

      Author's profile photo Karen Avendano
      Karen Avendano

      Hi Xavier.

      Ok I'm going to take this test and I'll tell you how it goes.

      Today they asked me a question about the surrogate keys? this is a concept in traditional databases that were trying to turn into a data warehouse. But here in DWC how is this concept understood? because in SAP BW he created them automatically and they are the SIDs.