Technical Articles
Integrate any data source into your SAP Data Warehouse Cloud Data Flow!
SAP Data Warehouse Cloud’s Data Flow feature allows you to model your ETL (Extract, transform, load) processes for moving data physically into your SAP Data Warehouse Cloud space. However, you cannot consume each and every source connected to your tenant in your data flows out-of-the-box. In this blog post I explain how to integrate any source easily into your data flows.
SAP Data Warehouse Cloud & Data Flows
SAP Data Warehouse Cloud is SAP’s newest data warehouse offering in the public cloud domain available as a Software-as-a-Service (SaaS) offering. It can be used as a standalone public cloud data warehouse solution or combined with existing enterprise data warehouse applications in your company like SAP BW/4HANA, SAP BW or SAP HANA for SQL data warehousing.
If you’re not yet familiar with SAP Data Warehouse Cloud check out my further readings at the end of this post.
Besides other things, SAP Data Warehouse Cloud comes with a feature called Data Flows – a functionality typically used to model ETL processes to load data from a remote source, running in the cloud or on-premise, into your SAP Data Warehouse Cloud space.
Combining the ETL capabilities with the rich connectivity options of SAP Data Warehouse Cloud gives you great possibilities to integrate many different data sources in a single, unique and easy-to-understand way into your space.
However, not all remote sources SAP Data Warehouse Cloud can connect to are directly available in your Data Flows. But there’s a small trick how to convince SAP Data Warehouse Cloud to connect to any source you integrated into your space from within your Data Flows, too! 🙂
Creating Connections in SAP Data Warehouse Cloud
Within your space (what the heck is a space?! Well, check it out here!) creating connections to remote sources, cloud or on-premise, is done with easy. The connection dialog offers you many different tiles to choose from.
Choosing the connection tile
For each connection type the dialog tells you whether the connection can be used in either the available view builders (“Remote Tables” means usable in SQL View Builder or Graphical View Builder) and Data Flows. Let’s take Google BigQuery as an example. When creating a new connection to one of your Google BigQuery instances, you can directly consume this instance when creating SQL or Graphical Views in SAP Data Warehouse Cloud as Remote Tables is shown as OK (green checkmark). However, you cannot use any connections of this type in your Data Flows.
Validating the connection delivers the same result.
Validating a Google BigQuery connection
The connection is enabled for Remote Tables, but not supported in Data Flows.
But how in the world can I use it in my Data Flows? I’d really love to apply some scripting capabilities using Python to the Google BigQuery data! And of course I do not want to first load the data using a graphical view to only then use it my Data Flow.
E/R Models to the Rescue!
What we need as a source in our Data Flows is a Remote Table. But, how to create one?! There are different ways how to do this where one is using the E/R model feature in SAP Data Warehouse Cloud.
Using an ER model to create a Remote Table
E/R models allow you to import data assets from a connection you created in your space.
Importing data assets from connections
You can browse the connection and select the data sets you are interested in.
Importing data assets from connections
After successful import the data sets are available in your space as Remote Tables – exactly what we needed!
Data assets available as Remote Tables in our space
Create Data Flows with Remote Tables
Within your Data Flow the imported Remote Table is now available from the Repository tab. Since it is a Remote Table the data access is virtual by default, meaning until now no data is physically stored in your SAP Data Warehouse Cloud space but always accessed live directly in the source.
Use Remote Table in Data Flows
We can now go ahead and model our Data Flow as usual and add more data sources, join or union the data, apply some Python scrips and finally store the result in our SAP Data Warehouse Cloud space, all done by simply drag & drop activities.
Load book reviews grouped by year
Conclusion
And there we go – data loaded from a source initially not supported by Data Flows, now loaded, transformed and stored in your SAP Data Warehouse Cloud space actually using Data Flows!
Data preview from local table
Using the E/R models is one way of achieving this result. To make example complete you can also use SQL Views or Graphical Views to create the remote tables. Also you can use any SQL View or Graphical View you create in your SAP Data Warehouse Cloud space as a source in your Data Flows. However, if you do not want to spend much time building views but want to consume a remote source as quickly as possible in your Data Flow or you want to consume multiple entities, using the E/R model editor for importing these data sets as Remote Tables is the easiest and most convenient way.
Try it out yourself! Our free 30-days trial offering is here to help.
You can get yourself a free 30-days SAP Data Warehouse Cloud trial tenant with all the features enabled. Check out our free trial page here.
Any questions?
Let me know in the comments or ask your question in the Q&A area.
Hi Jascha Kanngiesser,
Thanks for this excellent blog ! This is exactly what I need with regards to Data Flows . However I am unable to use an existing table created in SAP Data Warehouse Cloud as a target using Data Flows. Have you encountered this or this is by design or just a “feature”.
Regards
Afshin
Hi Afshin Irani,
glad you like it! 🙂
Is the table a local or remote table? Only local tables manually created in SAP Data Warehouse Cloud can be used as target tables in your Data Flows. You cannot use Remote Tables as target tables as Data Flows do not allow for writing data to external (remote) targets, but only from remote sources into your target SAP Data Warehouse Cloud space (local table).
Best,
Jascha
That works very nice!
Would love to see similar capability of E/R modelling and data flows towards SAC (planning) models & datasets.
Hi Jef Baeyens,
thanks for your comment! Concerning the interoperability of SAP Data Warehouse Cloud & Analytics Cloud, especially in the context of planning, we are working on a tight integration. We'll not planning to enable E/R models and Data Flows in SAC itself, too, but due to the tight integration of the two applications you would benefit implicitly from these features in SAC (planning).
Thanks,
Jascha
On SAC side there are still major data ingestion overhauls ongoing (data frames etc.) so I was just hoping to see some more synergy of some sort.
Hi Jascha,
Thank you for the very informative blog post. However, on my 30 days trial tenant there's no "data flows" category in the Data Builder. From what i've read this functionality currently isn't available on trial tenants - could you please confirm this ?
Kind regards,
Chris
Hi Krzysztof Korościk,
that is true, but only a temporary limitation. Can you let me know your tenant URL via mail?
Thanks,
Jascha
I've send you the link via email. Thank you for the assistance
Hi Jascha,
I also can't find the Data Flow tab. Are they any plans to remove this limitation in the trial tenant?
Br,
Pawel.
Hi Paweł Wiejkut unfortunately it is not supported for Trial tenants.
Hi Jascha,
is the import of the source object as remote table mandatory
or the dataflow should also work pointing directly to a source object "exposed" by the connection?!
Thanks,
Andrea
Hi Andrea Toniazzi it's only mandatory in case the connection doesn't expose it in the Data Flow Builder directly.