Technical Articles
Building a Covid-19 Chatbot powered by SAP BTP (Part 4/4): Data Modeling and Advanced Analytics with SAP Datasphere and SAP Analytics Cloud
Introduction
This article is the final of a four-part blog post series by Sebastian Schuermann and me. We met in the 2021 Career Starter Program and together we took the opportunity to discover and work with the latest technologies as part of the SAP Innovator Challenge 2021.
Together, we developed a COVID-19 chatbot (Vyri) as a solution to keep users updated regarding regulations, news, and statistics about the current pandemic. In this series, we will present our chatbot solution, which we implemented based on SAP Business Technology Platform (SAP BTP) components. The goal of the blogposts is to share our personal experience with the SAP BTP solution portfolio and its integration.
This final blog post, “Building a Covid-19 Chatbot powered by SAP BTP (Part 4/4): Data Modeling and Advanced Analytics with SAP Data Warehouse Cloud and SAP Analytics Cloud”, introduces the data warehouse and analytics capabilities enabled by the SAP Data Warehouse Cloud and SAP Analytics Cloud software as a service solutions. Note: We only touch on the functionalities of SAP Data Warehouse Cloud and SAP Analytics Cloud in this blogpost, as we focus on our chatbot use case. Further sources of information are linked throughout the blogpost.
The other articles in this series can be accessed here:
- Setting the Stage (Part 1/4)
- Accelerating Data Transformation and Governance with SAP Data Intelligence (Part 2/4)
- Creating a Chatbot with SAP Conversational AI (Part 3/4)
Agenda
- What is SAP Data Warehouse Cloud?
- What is SAP Analytics Cloud?
- Data Modeling (Data Layer and Business Layer)
- Geo Enrichment of Data Set
- Implementing a Hierarchy in a Graphical View
- Data Flows
- Analytics Cloud Story with Vyri Chatbot
- Conclusion
1. What is SAP Data Warehouse Cloud?
SAP Data Warehouse Cloud is a data warehouse as a service solution tailored for both business and IT users. SAP Data Warehouse Cloud enables data integration, database, data warehouse and analytics capabilities. The solution is based on the SAP HANA Cloud in-memory database, enabling real-time insights on cloud and on-premise data, both structured data as well as unstructured and geospatial data. For more information, click here.
2. What is SAP Analytics Cloud?
SAP Analytics Cloud is a software as a service solution that enables enterprise-wide advanced analytics end-to-end. The solution includes business intelligence, predictive analytics as well as enterprise planning capabilities. More information about the SAP Analytics Cloud can be obtained here.
3. Data Modeling (Data Builder and Business Builder)
Successful data modeling is a crucial step to generate the best possible insights out of data. In SAP Data Warehouse Cloud, data modeling is divided into two levels, between the Data Builder and the Business Builder, as IT users and business users should be enabled for data analysis.
The Data Builder is focused on IT users and enables the acquisition of data from a variety of sources. Tables (local tables as well as remote tables) and views can be created in the data layer to harmonize the data. Furthermore, entity relationship models and data flows (see chapter 6) can be created. Consumable views are created in the data layer, which can be passed to the Business Builder or consumed directly by SAP Analytics Cloud and other analytics clients. You can get more information about the Data Layer here.
As described in the second blog post, the Covid-19 data is ingested into SAP Data Warehouse Cloud via SAP Data Intelligence. For this purpose, the data is written to local tables of the data layer. The following image shows an example of the local table for the Covid-19 statistics data, which is created and populated via SAP Data Intelligence.
Local Table for Covid-19 Statistics Data (Image Source: Own Image)
In our scenario, Graphical Views are built on top of the local tables to extend and combine the data (including joins, filters and projections). The “Expose for Consumption” flag (see following image) specifies that the data can be consumed directly via an analytic client (such as SAP Analytics Cloud) without using the business layer.
Graphical View for Covid-19 Statistics (Image Source: Own Image)
The Business Layer receives artifacts from the Data Layer as source and outputs perspectives that can be consumed by analytical clients like SAP Analytics Cloud. The Business Builder has been designed for business users and puts the data into a business understandable language. In the Business Builder, data entities are consumed as analytical datasets and dimensions and combined into topic-specific consumption models. In this use case scenario, the Business Builder was not used because graphical views of the data layer are passed directly to the SAP Analytics Cloud. More information about the business layer can be found here.
4. Geo Enrichment of Data Set
In our use case, the Covid-19 statistical data provided by the Robert Koch Institute (see blog post 1) was to be enhanced with geospatial data in order to display it on a geomap in the SAP Analytics Cloud. At the time the use case was developed, there was no default way in the SAP Data Warehouse Cloud to create geospatial data that can be consumed in the SAP Analytics Cloud. Therefore, the steps we had to take to achieve a workaround are presented below. This workaround was presented by Boman Hwang in a blog post. In the meantime, there is a feature to simplify geo-enrichment in the SAP Data Warehouse Cloud.
Step 1: A local table is created which contains fields for latitude and longitude of data type Double (see the following graphic). The OBJECTID is an ID for the respective county (from the Covid-19 statistics data) for which the geodata is to be enriched. The data is loaded into the table via CSV upload.
Local Table with Columns for Latitude and Longitude (Image Source: Own Image)
Step 2: A table function is developed that generates a field of type “ST_POINT” (geospatial point) and fills it with the longitudes and latitudes of the first table.
Table Function for Generation of a Column with Geospatial Points (Image Source: Own Image)
Step 3: The table function is joined to the other data (Statistical Covid-19 data) in a Graphical View.
Graphical View enriched with Geo Data (Image Source: Own Image)
In the Graphical View of type “Analytical Dataset” this view is added in the Associations to get access to the field of type “ST_POINT“.
Association to “Geo Data” View in Graphical View of Type “Analytical Dataset” (Image Source: Own Image)
Done! In a few steps a dataset can be enriched with geodata!
5. Implementing a Hierarchy in a Graphical View
In SAP Data Warehouse Cloud, there is the ability to develop both level-based and parent-child hierarchies to enable enhanced data modeling and data visualization. For our use case, we implemented a level-based hierarchy in the data model to enable a drill-down from the state level to the county level within SAP Analytics Cloud. Here, we took inspiration from Lale Yakut Valusek’s blog post.
Step 1: First, a graphical view of type “Dimension” is created for the hierarchy. In this view a hierarchy can be added via the “Hierarchy” button.
Graphical View of Type „Dimension“ (Image Source: Own Image)
Step 2: In the Hierarchy dialog we define which levels this hierarchy should contain with fields that are included in the graphical view. In our use case we define that the State is the first level of the hierarchy and the County is the second level.
Hierarchy Dialog in Graphical View (Image Source: Own Image)
Step 3: Now an association to this Dimensions View is created in the Graphical View in which the hierarchy should be usable.
Graphical View of Type “Analytical Dataset” with Association to Graphical View of Type “Dimension” (Image Source: Own Image)
Et voilà! You can now use the hierarchy in the data model and later in an analytic client. More information can be found here.
6. Data Flows
With Data Flows, SAP Data Warehouse Cloud enables data integration and data transformation of various structured and semi-structured data from different data sources (SAP and non-SAP). More information about Data Flows can be found here.
Data flows are modeled flow-based and start with a source (e.g. tables, CDS views, remote files or OData services) and end with a target. In between, different operators can be combined. Currently there are operators for projections, aggregations, joins, filters and unions to achieve standard transformation capabilites. Advanced transformation requirements can be implemented via the scripting operator. Up to now, Python 3 can be used for programming. The two libraries Pandas and NumPy are included.
The data flows can be started directly or scheduled for permanent use. The pipelines can be monitored via the Data Flow Monitor.
Data flows were designed in our use case to easily transform the data of the data model. For example, a field of the data type Timestamp was transformed into another field of the type Date (see picture). This use case could also have been implemented with SAP Data Intelligence. However, we wanted to transform the data, which is already persisted in the SAP Data Warehouse Cloud, quickly and easily within the environment. Furthermore, we honestly also wanted to test the possibilities of the Data Flow artifact. The transformation of the data type and the scheduling of the data flow could be implemented in about half an hour.
Data Flow with Python Code (Image Source: Own Image)
Conclusion: There are powerful transformation options within the SAP Data Warehouse Cloud. However, these are less extensive than the use in combination with SAP Data Intelligence. For example, only scripting with Python is currently possible. The usable libraries cannot be extended, unlike in Data Intelligence.
7. Analytics Cloud Story with Vyri Chatbot
A demo of our Vyri Dashboard was already presented in the first blog post. The dashboard was implemented using a Story in SAP Analytics Cloud (dashboards are called “stories” in SAP Analytics Cloud). For this purpose, the SAP Data Warehouse was connected to the SAP Analytics Cloud via a live connection (see here how to set up the connection). This means that the data is not redundantly replicated in the SAP Analytics Cloud, but loaded directly via the Data Warehouse Cloud in the browser. The data models connected in the SAP Analytics Cloud Story are Graphical Views of the SAP Data Warehouse Cloud, which have been directly released for consumption (see Chapter 3).
The Vyri dashboard is structured as follows (see the following figure): On the left side, statistical data on Covid-19 is displayed (including Number of Cases, Number of Deaths, Incidence Count, etc.). This data can be filtered by date and hierarchy (State and County). On a geomap, statistical key figures can be visualized (via a Measure Input Control) with bubbles. On the right side, data on current Covid-19 tweets and Covid-19 news in Germany are displayed, as well as the analyzed sentiment of these news and tweets. On the lower right side of the dashboard, the chatbot “Vyri” is also embedded (see blogpost 3).
Vyri Dashboard in SAP Analytics Cloud (Image Source: Own Image)
The SAP Conversational AI chatbot was embedded in the story via the Web Page widget. For this purpose, the chatbot was hosted on a website that is referenced in the widget. To allow the widget of the chatbot to overlap with the graphics related to the latest Covid news, a Canvas page was used for the dashboard, as this requirement could not be achieved with Responsive pages. If you are interested in comprehensive ways to embed a Conversational AI chatbot in SAP Analytics Cloud, I can recommend the following blog post.
8. Conclusion
We took the opportunity to explore different capabilites of SAP Data Warehouse Cloud and SAP Analytics Cloud. In this blog post, we presented how data modeling is designed in the SAP Data Warehouse Cloud, how level-based hierarchies and geo-enrichment can be achieved, and what options the data flows enable. Furthermore, we introduced how a dashboard for our Vyri solution including embedded chatbots was developed using SAP Analytics Cloud.
It should be noted again that many options of the SAP Data Warehouse Cloud (e.g. space management and data replication) and the SAP Analytics Cloud (e.g. predictive analytics and enterprise planning) were not used at all for this Covid-19 use case. These are certainly exciting for further development of the use case.
This was the final blog post of our series, thank you for reading! We hope you find this post helpful and will also read the other posts in our blogpost series. For any questions or feedback just leave a comment below this post.
Best wishes,
Tim & Sebastian
Find more information and related blog posts on the topic page for SAP Data Warehouse Cloud.
If you have questions about SAP Data Intelligence you can submit them in the Q&A area for SAP Data Warehouse Cloud in the SAP Community.