SAP Data Warehouse Cloud’s Data Flow feature allows you to model your ETL (Extract, transform, load) processes for moving data physically into your SAP Data Warehouse Cloud space. However, you cannot consume each and every source connected to your tenant in your data flows out-of-the-box. In this blog post I explain how to integrate any source easily into your data flows.
SAP Data Warehouse Cloud & Data Flows
SAP Data Warehouse Cloud is SAP’s newest data warehouse offering in the public cloud domain available as a Software-as-a-Service (SaaS) offering. It can be used as a standalone public cloud data warehouse solution or combined with existing enterprise data warehouse applications in your company like SAP BW/4HANA, SAP BW or SAP HANA for SQL data warehousing.
If you’re not yet familiar with SAP Data Warehouse Cloud check out my further readings at the end of this post.
Besides other things, SAP Data Warehouse Cloud comes with a feature called Data Flows – a functionality typically used to model ETL processes to load data from a remote source, running in the cloud or on-premise, into your SAP Data Warehouse Cloud space.
Combining the ETL capabilities with the rich connectivity options of SAP Data Warehouse Cloud gives you great possibilities to integrate many different data sources in a single, unique and easy-to-understand way into your space.
However, not all remote sources SAP Data Warehouse Cloud can connect to are directly available in your Data Flows. But there’s a small trick how to convince SAP Data Warehouse Cloud to connect to any source you integrated into your space from within your Data Flows, too! 🙂
Creating Connections in SAP Data Warehouse Cloud
Within your space (what the heck is a space?! Well, check it out here!) creating connections to remote sources, cloud or on-premise, is done with easy. The connection dialog offers you many different tiles to choose from.
For each connection type the dialog tells you whether the connection can be used in either the available view builders (“Remote Tables” means usable in SQL View Builder or Graphical View Builder) and Data Flows. Let’s take Google BigQuery as an example. When creating a new connection to one of your Google BigQuery instances, you can directly consume this instance when creating SQL or Graphical Views in SAP Data Warehouse Cloud as Remote Tables is shown as OK (green checkmark). However, you cannot use any connections of this type in your Data Flows.
Validating the connection delivers the same result.
The connection is enabled for Remote Tables, but not supported in Data Flows.
But how in the world can I use it in my Data Flows? I’d really love to apply some scripting capabilities using Python to the Google BigQuery data! And of course I do not want to first load the data using a graphical view to only then use it my Data Flow.
E/R Models to the Rescue!
What we need as a source in our Data Flows is a Remote Table. But, how to create one?! There are different ways how to do this where one is using the E/R model feature in SAP Data Warehouse Cloud.
E/R models allow you to import data assets from a connection you created in your space.
You can browse the connection and select the data sets you are interested in.
After successful import the data sets are available in your space as Remote Tables – exactly what we needed!
Create Data Flows with Remote Tables
Within your Data Flow the imported Remote Table is now available from the Repository tab. Since it is a Remote Table the data access is virtual by default, meaning until now no data is physically stored in your SAP Data Warehouse Cloud space but always accessed live directly in the source.
We can now go ahead and model our Data Flow as usual and add more data sources, join or union the data, apply some Python scrips and finally store the result in our SAP Data Warehouse Cloud space, all done by simply drag & drop activities.
And there we go – data loaded from a source initially not supported by Data Flows, now loaded, transformed and stored in your SAP Data Warehouse Cloud space actually using Data Flows!
Using the E/R models is one way of achieving this result. To make example complete you can also use SQL Views or Graphical Views to create the remote tables. Also you can use any SQL View or Graphical View you create in your SAP Data Warehouse Cloud space as a source in your Data Flows. However, if you do not want to spend much time building views but want to consume a remote source as quickly as possible in your Data Flow or you want to consume multiple entities, using the E/R model editor for importing these data sets as Remote Tables is the easiest and most convenient way.
Try it out yourself! Our free 30-days trial offering is here to help.
You can get yourself a free 30-days SAP Data Warehouse Cloud trial tenant with all the features enabled. Check out our free trial page here.
Let me know in the comments or ask your question in the Q&A area.