Data ingestion from SAP HANA cubes into SAP Analytics Cloud. Instalment 1.
In this article I would like to share some insight into the data ingestion techniques from SAP HANA cubes into SAP Analytics Cloud data wranglers.
When it comes to SAP HANA the focus will be on using different SAP HANA cloud services available on SAP Cloud Platform [SCP] Cloud Foundry [CF].
In this instalment, I shall start with SAP PSA (Persistence Service) HANA service on Azure CF.
Next will follow the HaaS CF and eventually the tally will culminate with the latest HANA Cloud service on [CF].
When it comes to SAP Analytics Cloud the product features I will be making use of are described in the Feature Scope Description (FSD) document.
However, it has been brought to my attention that there are use cases where the live connectivity is not desirable especially with business applications that require data be fully vetted and curated at all times and where the content providers need to impose stricter controls over the data/content being delivered to the consumers.
The live SAP HANA connection has been an outspoken advantage that SAC has had over its competition.
Live or direct connectivity means not only there is no data replication but it also implies a semantic knowledge of the data source.
But, and this is maybe of a less common knowledge, with SAC one can create queries and acquire data directly from SAP HANA cubes with no ETL tool required.
(This resembles a lot the universe/query paradigm found with WebIntelligence for example.)
How is it possible ?
SAC sports a long list of so-called acquired data sources. Again, SAP HANA is one of them.
SAC is a cloud appliance sitting in its own SAP Cloud Platform [SCP] sub-account (on either Cloud Foundry or SAP Neo).
And this is where the SAP Cloud Connector [SCC] comes into the mix as it allows to leverage the cloud to on premise connectivity through the SCP connectivity service.
Question. But SCC provides the secure communication tunnel only. So what about the data itself? How do one gets connected to a data source?
Answer. That’s where the SAC Cloud Agent [C4A] comes into play.
The C4A is essentially a connectivity broker – “the connection server” – a witty piece of middleware that understands the semantics of the underlying data sources.
C4A is provided as a ready-to-deploy servlet. It is best deployed on the on-premise side together with SCC.
Let’s have closer look on how to acquire data from SAP HANA Service (PSA) available on SCP Cloud Foundry on Azure .
As a quick reminder all SAP HANA services on Cloud Foundry come without application runtime (no XSA). And with the exception of the PSA-based HANA service they do not offer XSC either.
That makes live connectivity a little bit more cumbersome because it implies the deployment of the HANA Analytical Adapter [HAA] which is a dedicated java application that implements the SAP Information Access (InA) REST protocol required to establish the live connectivity.
But if the data acquisition is a viable option it may help simplify the data ingestion process and help keep the cost low (by eliminating the need of using the CF application runtime for HAA deployment)
In a nutshell, all that has to be done is to expose the jdbc SQL endpoint of the SAP HANA service.
With PSA HANA Service the jdbc SQL endpoint is already exposed on CF. However, in order to be able to use it one would need to create and bind a CF application to the service first.
This is beyond the scope of this article but if there is interest from the readers to cover this approach I might do it in one of the next episodes. so please do vote:).
Alternatively, one can use the SCC to connect to the CF sub-account where the HANA PSA-based service has been provisioned to and then create a service channel to the HANA instance.
As we have already made use of SCC to enable cloud to on premise connectivity from SAC side we shall re-use the same SCC for the service channel sake.
We can create several service channels to the same [tenant] database as depicted below.
For instance, this is how it looks like with SAP HANA Studio.
There are 3 different variants of connections definitions using two different service channels ports (30515 and 30615).
It is worth noticing is that all 3 connections point to one same tenant database (the system database is managed by SAP and cannot be accessed via the service)
OK. From now on one can attempt to create a SAP HANA import connection with SAC.
We shall focus on the two following options:
- connect to _SYS_BIC schema and get access to the cubes there [with a “classic” database user]
- connect to a HDI container and acquire data from the container’s cubes [using the user access details from the service key of the HDI container service instance.]
The first option may be very convenient; when it comes to migrating from on-premise HANA development into Cloud Foundry universe.
The latter (and preferred) option allows to fully unleash the power of hardware deployment infrastructure with the focus on container shipment as opposed to package delivery.
In order to demonstrate the first approach I uploaded a well known HANA SHINE package into the database.
This mimics a typical SAP HANA development paradigm where developers would work on packages that may contain a number of cubes each.
(on a side note: the data would be loaded into HANA tables either via HANA SDI engine or any other ETL tool (SAP SLT, SAP Data Services, SAP DataHub etc) – this is not in scope of this article)
Let’s try to connect the dots.
Next step is with SAC and all that has to be done there is to create a new SAP HANA connection. (That may require having an administration profile granted to your SAC user.)
You may pick up any cube available for this connection and then build a query to further refine the data you are about to bring into SAC model.
Once your query is built you will execute it.
This will start data acquisition into a new data wrangler that will be securely kept by SAC for a period of up to 7 days:
Data wrangler is an important concept. It holds the acquired data.
All the subsequent operations will be performed on the SAC tenant itself.
Data model and smart discovery.
From now the data can be further curated and eventually a model will be created.
The data in the model can be periodically refreshed through scheduling.
The model can be used to build stories either manually or automatically leveraging the the smart discovery functionality.
For instance you may have decided to run a smart discovery based on Netamount measure.
SAC will then generate a story with 4 pages containing the overview, the influencers, the outliers and an interactive simulation page.
OK. That concludes the part 1. I hope you have enjoyed reading it. Looking forward to comments.
In the second instalment I shall demonstrate how to import data from an HDI container (option 2 above)