My First Experience with SAP Data Warehouse Cloud – Introduction to Data Builder
This blog aims to assist and complement the understanding of the Data Builder on a high level and is based on my personal learnings.
SAP Data Warehouse Cloud’s Data Builder allows for technical personas to model underlying relationships of data from various sources and levels of granularity. It provides the ability of leveraging graphical interfaces to model relationships through ‘drag and drop’ tools whilst still offering traditional SQL scripting capabilities.
Data Builder has the following options to create and manage a:
- Graphical View
- SQL View
- Entity-Relationship (ER) Model
- Data Flow
- Intelligent Lookup
- Task Chain
This blog delves further into the fundamental concepts of Tables, Graphical Views, SQL Views, ER Models and Data Flows.
The capabilities listed above have distinct functionality so must be chosen wisely when completing a project. However, as a beginner, the differences between some of the capabilities may seem indistinguishable. I have addressed some of these differences below to allow for beginners to acquaint themselves with SAP Data Warehouse Cloud easily.
Table vs (Graphical / SQL) Views
A table is a set of related data that is organized by columns and rows. Its definition is based on the metadata of the table from a source system (e.g. SAP S/4HANA). Although it is uncommon, tables do not necessarily need to hold any data. Tables also occupy physical space on systems and exist in some database.
In SAP Data Warehouse Cloud, when we click ‘New Table’, we define the metadata of the table rather than the data it holds. Some examples of defined features include table name, table columns and column datatypes. However, SAP Data Warehouse Cloud does allow users to ingest data after the table’s definition is deployed by clicking on ‘Data Editor’. (Note: If you can’t see data editor pop-up, you have not yet deployed your table). Usually, most of the tables are not defined in SAP Data Warehouse Cloud, but rather in source systems such as SAP S/4HANA, SAP BW/4HANA or from other non-SAP systems.
Contrastingly, views combine data from tables in a meaningful way and is virtualized. These views are formed as a result of a query on one or more tables. Views do not consume physical space on our system. However, they do require temporary computational space during the time it is used. They assist in combining and manipulating information from multiple tables. They also assist in maintaining data security by revealing just the necessary information from the table rather than exposing all of the table’s data.
In SAP Data Warehouse Cloud, views are created via a graphical interface or SQL. A Graphical View is the view which is created using the graphical interface whereas a SQL View is the view which is created using SQL script. These views allow for joins, filtering conditions, hiding columns, adding calculated columns and other operations on the data stored in the table. This ability in creating views proves advantageous in extracting data from databases and only exposing necessary data for consumption by various personas while still maintaining security. This is also one of the many reasons why SAP Data Warehouse Cloud can be a powerful option for ‘Data Federation’ scenarios.
Entity-Relationship (ER) Model vs Graphical View
ER models are used to describe the relationship between tables and views. A relationship can be between a table and another table, a view and another view or a table and a view. ER models are data models that define the overall structure in a database.
The necessity of ER models lies in a common scenario where multiple tables are involved and the relationships between these entities are much more complex. It would be difficult and too complex to associate all the entities via a linear Graphical View. ER models solve these complex scenarios by associating the entities in a similar fashion to a database model.
When one of the tables whose association was defined in an ER model is dragged and dropped, the SAP Data Warehouse Cloud user can easily call upon its associated entities through the + icon and perform the join. This makes the process of constructing a view significantly easier when numerous tables are involved.
Data Flow vs Graphical View
Data flows move and transform data from one or more tables to a target table and is key to performing extract, transfer, load (ETL) operations. In most cases, significant amounts of data are required to be consolidated from multiple source systems and transformed to a specific format in its own separate table. Data flows are a good option for this process as it extracts the data from the source systems, transforms it to the specified format and loads it into a new separate table. This resulting output of the ETL process is termed as ‘data persistence’ whereas views, in contrast, is termed as ‘data virtualization’.
Data flows also assist in partitioning large, required transformations that would exceed the computational capacity into smaller tasks which can be achieved through multiple independent data flows.
As discussed above, views exist as a result of queries on tables. By default, they are not ‘persisted’ as when creating a new table.
To provide an analogy for better understanding, views are like seeing the table through glasses. These glasses contain lens which are analogous to queries applied to data. Hence, we see the resulted queried data through the lens of the glasses.
As an example, we construct a view to find data which is a cat (based on the filter query) from a table containing many animals. This occurs through wearing and seeing through the lens of the glasses.
However, leveraging data flows, we also write the data to a target table rather than running a simple query. Continuing with the analogy, it is similar to viewing through the glasses and drawing a table of the data we see, which contains only cats, next to the table of many animals.
An example of a use case for data flows would be to take snapshots of data across time to construct a time series. Views cannot retrieve the previous state of the table so would not be an option once the data is updated. However, data flows can be utilized to create multiple tables containing snapshots of the table across time as new data flows in.
The writing of data into a new separate target table or ‘persistency’ is a key differentiator between views and data flows.
I hope this blog has differentiated some of Data Builder’s tools within SAP Data Warehouse Cloud. If you are looking for more information, refer to the SAP Data Warehouse Cloud community page which provides the latest release updates, customer success stories and more. Additionally, refer to a beginner friendly tutorial to familiarize yourself with SAP Data Warehouse Cloud’s interface. Furthermore, for similar beginner-friendly content, please subscribe to my profile: SAP Community Profile.
Please don’t hesitate to comment any thoughts you may have on the topics discussed or general feedback on the blog. I look forward to engaging with you over SAP Data Warehouse Cloud!
As always, these blogs could not have been written without the support of my colleagues. I would like to acknowledge Marianne van Loenan for acting as my mentor in acquiring the knowledge pertaining to SAP Data Warehouse Cloud. I also would like to acknowledge Balakumar Ganesan for reviewing this article and sharing his expertise.