Get Started with Data Modeling in SAP Data Warehouse Cloud
Before we start our tour into SAP Data Warehouse Cloud world, I’d like to mention, that all content here was written based on the Beta Version of SAP Data Warehouse Cloud. Software settings, coding and/or code snippets are examples. They are not for productive use. This blog post is only intended to better explain and visualize new functionalities in SAP Data Warehouse Cloud.
Also, consider that experimental features are not part of the officially delivered scope that SAP guarantees for future releases. This means that experimental features may be changed by SAP at any time for any reason without notice.
All set then?
As you already know, SAP Data Warehouse Cloud is an end-to-end warehouse in the cloud that combines data management processes with advanced analytics.
This blog post will be divided into 6 steps to give you a feeling of what is available there for Data Modeling:
- 1 step || Agent Settings
- 2 step || Create your Connections
- 3 step || Create your Space
- 4 step || Data Modeling
- 5 step || Business Catalog
- 6 step || Create Stories
1 STEP || AGENT SETTINGS
To get started with a more like “real” scenario in SAP Data Warehouse Cloud Beta Version, we will use a on premise SAP HANA and SAP BW/4HANA systems as sources. For this, we need to execute some steps.
In Administration Console, we have to register one data provisioning agent. It is needed because SAP Data Warehouse Cloud connects to on-premise systems through Smart Data Integration and its agent.
You can install DPA wherever you want. I have installed mine in my own machine for example.
If you need some help on DPA installation, check it out: Configure Smart Data Integration with the SAP HANA Service.
When you click on [Create Agent Communication Credentials], you have to choose a name for your agent (this is the name you’re going to use in dpagentconfig.ini.) and take note of all settings as follows:
The complete setting for getting it done is available on Connecting to On-Premise Sources.
When DPA is set and agent is started, give some time to Agent status to change ?
When it is connected, it’s done!! You can create your connections.
2 STEP || CREATE YOUR CONNECTIONS
In Beta version, only three (3) connections are available at the moment:
For our case, let’s use ABAP and HANA connections. They are very simple to be set.
Once you choose you Data Provisioning Agent, you just have to insert your environment settings.
On the connection icon, you can check if connection is ok (take care about your browser cache and auto-filling settings, sometimes it might bother you in getting the connections work fine).
3 STEP || CREATE YOUR SPACE
Spaces will allow you to manage the artifacts you created in it. Spaces are decoupled, yet open for flexible access, so your users can collaborate without having to worry about sharing their data. To model your data and create stories, you need to start off with a space.
Let’s do it then. by selecting + button on tile or on menu bar of Space Management.
In the first tab, you can set total storage to assign to the space as well as how much of that disk storage can be leveraged as in-memory storage.
In tab Users, you have to assign all users you want to have access to the space (remember to do it, if you don’t, you won’t see anything in data builder to create your models).
Now, you can assign the connections we’ve created in previous steps. Thus, the users you assigned to this space can work with the data that originate from these connections later in the Data Builder.
Remember that the first step to combining data from different systems is to connect the sources you wish to work with to your space.
Once the connections are assigned to your space, you are able to combine them using any of the modeling tools in the Data Builder such as the Graphical View, or the SQL View. The Graphical View is great for visual, intuitive data modeling, whereas the SQL View is great for IT or analyst users that have previous knowledge in SQL.
To combine tables from different sources simply select the source you want and drag the table onto the canvas. This works with any data source combination. For example, you can combine SAP HANA and Odata sources, or local CSV tables and SAP ABAP sources.
For our example, I’ll assign the SAP ABAP and SAP HANA connections to the space.
The main use-case of creating an open SQL schema is to allow Data Warehouse Cloud access to third-party tools, for example via JDBC. Anyway, I’ve created it.
This is it. You’ve created your space. Let’s go modeling now. 🙂
4 STEP || DATA MODELING
If you are here and can’t see any spaces available, come back to space management and assign your user to it. If you already did it, you should be able to see your space.
You may be interested in reading this good article about Data Modeling: https://saphanacloudservices.com/data-warehouse-cloud/resources/model-data/
You may see these options and get confused, but let’s understand better each feature for data modeling and data preparation:
This artifact is really excellent, there you can model (or even design) your entities, attributes and relationships in a Conceptual Data model. By doing this, you get your data sets associated. Association is the high-level relationships between your data elements and metadata.
As new views and tables don’t have any associations to each other, defining associations makes it easier to model new graphical views, as all previously defined associations will be suggested in the graphical view builder. For example, join recommendations feature.
You may also get your sources (by dragging and dropping them into the canvas), automatically, imported and deployed (if you click ‘yes’).
You can freely create tables (inside your repository by saving it and in SAP HANA by deploying it).
IMPORT CSV FILE
Importing data from files is a common task in data warehouse scenarios. Normally, flat files only serve as a bare means of storing table information, but do not hold any relations between the tables included within them. This feature can be used for data prep. tasks as well (most of cases).
Models are used as a basis for stories. You can use tables and views, joins and unions in a graphical environment. This is a very import artifact because it is responsible for combining tables and even other views into a single output from graphical perspective.
There you have a bunch of resources to model your data, combine them from many sources and assign business semantics (Output Properties). Also, the graphical view needs to be defined as fact and has to include measures in order to be consumed in an SAC story — Don’t forget it! 🙂
If you are familiar with SQL, you can opt for SQL environment to design your views instead of graphical one. If you already have SQL statements handy from another solution, this is a great way to get a new view quickly.
Well, as you already know the options, let get our model started.
As mentioned, we can benefit a lot from E/R models in further modeling steps, then this can be our first step.
Source data here.
You can upload the CSV files by IMPORT CSV FILE option as well. But, as we wanted a more realistic scenario to study, I’ve imported this data in HANA and BW/4HANA systems to use on-premise connection for getting the data.
Behind the scenes when we associate a table with another it makes Data Warehouse Cloud recognize that there is a relation in between, it then recommends these relation in other artifacts. Note that tables in left hand side of the picture were generated by saving/deploying the E/R model. No further tasks were needed.
Back to 80’s, designing E/R model is nothing but get your database started, right? In this case, besides not serving as conceptual model “designer” only, E/R models speed up the development by importing tables used into your repository and consequently into SAP HANA (DWC).
Of course, table relationship knowledge is required at this moment.
Now it is time to model, let’s use graphical view for this.
Take a look at this figure for a moment.
Left hand side we have the [Repository] and [Sources]: they are the resources we have to model. You can reuse artifacts here from repository and/or even get them directly from your sources (remote). In case of using SAP HANA, bottom up scenarios are also possible (EDW perspective).
In the middle we have the canvas where we will model the data. As already mentioned, here you can create joins/unions, project your data, apply formulas and business semantics etc.
Each node has the following options:
The plus (+) button is the association button, here is where you can set the joins and have join recommendations from the system.
Remember we set it on E/R model:
It is great, isn’t? 😀
Filter button give you the options for filtering your nodes.
If your expression contains errors, you can check the message as follows:
By pressing formula button, you may think it is a little strange because there’s nothing saying “put your formula here”… 😛 Don’t mind, let get through it.
When you select the column you want to do the formula, an > button get available.
Then you get the options.
But, … and if you want to create a new calculated column?
Click on + button and expression options will be available.
Finally we got the Data Preview option.
This is great option, because when your model is too complex, it will be very handy to have the ability of checking your data by nodes.
We can’t forget about the nodes JOIN and UNION, right?
There are several options for joins here:
Wondering about “Natural” option? It just a join where the conditions are based on columns with the same name. 🙂
So let me show you how UNION node looks like:
Pretty simple, hun?
We have done our model using Graphical environment:
We could do the same by using SQL View, but for learning purposes, it makes more sense to get started by graphical view. Don’t you think?
Before we deploy it, there is one more thing that I believe it is very important. As mentioned, to use our models in SAC (embedded), the view should be typed as “FACT”. Let me describe the options you have.
It’s important to highlight these definitions, because SAP Data Warehouse Cloud implements them a little bit different.
A fact is the part of your data that indicates a specific occurrence or transaction. They are composed of multiple measures (or details about it) — they can be qualitative or quantitative.
The qualitative measures can be then linked to specific characteristics of that measure, which are called dimensions; it represents categories that provide perspective on your data.
They’re quantitative measure where you can apply calculations or aggregations; it represents quantities that provide meaning to your data.
More information, read: Facts, Measures and Dimensions article.
Now we can set our output as FACT then.
And assign the measures:
Save it and deploy it.
5 STEP || BUSINESS CATALOG
Before we go ahead and create our story, let’s check the Business Catalog.
This is really a catalog with all your developments.
To get a little deeper understanding on it, consider Business Catalog as the entrance door to Semantic Layer of your whole SAP Data Warehouse Cloud. It helps the business user to understand better what is available in SAP Data Warehouse Cloud through the semantic layer you applied to your models from a centralized perspective.
The artifacts there will be available according to the spaces you are assigned for.
If you click on your artifact, you get a complete overview of it, containing dependencies, aggregations types, columns used, responsible etc.
And finally, a great resource to help you to understand and speed up your development/enhancements is the Lineage.
There you can find all lineage of your views and also get into them accordingly by using [Edit in Data Builder] option.
6 STEP || CREATING STORIES
Stories are an organized perspective (story) of your analyzed results of combining different data sources and models.
You can find more information here.
Basically, from now on you just have to apply your SAP Analytics Cloud skills 🙂
This blog post tried to give you a better understanding of Data Modeling data flow in SAP Data Warehouse Cloud.
It might be tough to organize the possibilities of modeling your data flows in new tools when you see it for the first time. That’s the why I’ve done this.
Hopefully, It helps you to speed up your knowledge in SAP Data Warehouse Cloud.
Also, try the following links for more information:
Thank you for reading!