SAP Data Warehouse Cloud – new chances by automation of tasks
SAP Data Warehouse Cloud – new chances by automation
SAP Data Warehouse Cloud offers a very easy-to-use interface for developing data models. Especially in implementation projects with a very high number of data models, however, the interface quickly reaches its limits because the development process requires a lot of time. Via the Command Line Interface (CLI) of SAP Data Warehouse Cloud, there is an alternative way to configure Spaces including data models. Using the example of an Automation Framework, you can see which opportunities arise through the CLI and automate parts of the data modeling in SAP Data Warehouse Cloud.
Challenge (real project)
SAP Data Warehouse should be introduced. Initially, tables from the source systems are to be provided in central ingest spaces for each source system. Of course, necessary remote tables etc. can be developed manually. As the number of tables increases, the amount of manual repetitive work becomes increasingly high. In addition, there is the requirement that changes in the structures of the source tables in DWC should also be adopted. This can be done by development processes and manual activities.
However, the process for creating all the necessary Spaces including remote tables would be very lengthy, depending on the number. Departments must wait longer for results and can benefit from DWC later. Therefore, the question arises whether there is not a better way than performing recurring tasks manually.
This was an initial situation in one customer project, where almost 1.000 tables should be integrated in DWC.
The Command Line Interface (CLI)
The interfaces of SAP Data Warehouse Cloud are easy and intuitive to use. This is very good for smaller work packages and for developments by the business. On the other hand, it would be optimal if DWC could still offer an alternative programmatic approach to carry out developments in DWC.
This is where the Command Line Interface (CLI) comes in. The CLI is a standalone Node.js module and is available on npmjs.com. The CLI allows access to DWC tenants via the command line. “Bulk work” or recurring tasks that require a lot of manual work can be accelerated this way. Examples are
- Reading the Space definition (tables, members, etc.)
- Creation of Spaces, tables, views, etc.
- Assignment of users to Spaces
Jascha Kanngiesser has also described the possibilities in detail here. JSON is used as format, i.e. the definitions are available in this format. Below there are a few examples of use cases for the CLI.
Integration in Git
In SAP Data Warehouse Cloud there is currently no version management of development objects as developers know it from software development, e.g. when using Git. The definition of spaces can be exported to a JSON file via the CLI. If the JSON files are stored in Git, a proven version management is obtained. Automation of versioning (export JSON and import Git) is conceivable through third-party solutions (e.g. ISR Automation Framework), but not available out-of-the-box. A Git integration in the DWC interface is currently not foreseeable.
Developer Spaces / One Tenant Strategy
Based on a working Git integration, this scenario applies. Currently, nothing is locked in DWC when a developer is editing a view, for example. This means that a second developer can edit the same view at the same time. This imposes some restrictions or requires clear development processes.
Developers pull the definition of the productive space (JSON) from Git and a space is created under a different name (e.g. DEV1_<SPACE_NAME>) for development. The development is performed and accepted. Afterwards a merge in Git is performed with possible other developments of a release. The agreed Space definition is then imported via CLI into the productive Space. Some steps would have to be automated here to work efficiently (e.g. name change developer space, creation developer space, etc.).
One can follow a “one tenant” strategy or work with multiple tenants. I find this scenario very exciting because it allows large teams of developers to work in parallel and it doesn’t end in chaos.
Unfortunately, the scenario currently fails because the merge of the JSON files is not possible yet. Therefore, the described procedure only shows the potential of the solution. It will be interesting to see if the merge will be possible in the future. Then even larger development teams can be coordinated well.
Alternative transport management
The transport system in SAP Data Warehouse Cloud is very easy to use. One thing to note about the naming conventions is that the technical names on the source and target tenant must be identical. This means the Connections and Spaces must have the same name as on the production. There is no renaming or mapping. Often there is a requirement to use system abbreviations as names in the connections as well as in ingest spaces. Now it might be confusing that on the Dev / QA tenant Connections and Space names exist with production system abbreviations.
With the CLI a “renaming” can be realized. On the Dev/QA Tenant the definition is exported via CLI. In the JSON file(s) the objects must be renamed accordingly and then imported into the productive DWC via CLI. This is certainly a very simple example, but it shows that the CLI provides a high degree of flexibility for more complex scenarios.
In general, automation should be provided for the process in this scenario as well, to avoid manual efforts but also errors. At the same time, it is conceivable to integrate the deployment into a CI/CD process to automate such adjustments and the deployment itself.
Automation of modeling and quality checks
Let’s return to the initial challenge of the project where a mass of objects needs to be created in Data Warehouse Cloud. Not surprisingly, this is as a very good use case for the CLI. All that is needed are the appropriate JSON files to import into DWC. Below is an excerpt of such a JSON file.
The manual creation of a JSON file including the names of the tables etc. does not speed up anything at first. Especially if further views are to be created that are based on the remote tables and all columns are to be defined. This only makes sense if the creation of the JSON files and if possible, the import into DWC is automated. The automation of the process can be done by an Automation Framework.
The Automation Framework is a node.js application that runs locally or on a virtual environment. The creation of data models in SAP Data Warehouse Cloud can be performed in a few minutes using the framework and the CLI. In addition to the initial creation, an automatic regular reconciliation of source and target structures can also be implemented.
Below it is outlined how the framework works together with SAP Data Warehouse Cloud to generate the data models in SAP Data Warehouse Cloud:
1) Admin Space
An Excel file is used to define which source tables are required from which system. This file is saved as control table(s) in a dedicated space. The control tables specify which remote tables / views are to be created in which spaces in SAP Data Warehouse Cloud.
2) Determine requirement
The control tables represent the requirement for the Automation Framework. Via openSQL access to the Admin Space, it is queried which objects are to be generated in SAP Data Warehouse Cloud.
3) Determine Metadata
The framework is also connected to the source systems. This makes it possible to query the metadata of the source tables. For example, it is also validated that a table exists at all, but also which columns are contained.
4) Creation of JSON file for Command Line Interface
With the help of the previous steps, the necessary JSON files can be generated automatically. For this purpose, a script runs through, which knows the necessary syntax of the JSON file from Data Warehouse Cloud and generates the JSON.
5) Generation of data models in SAP Data Warehouse Cloud
Finally, the JSON file is imported into SAP Data Warehouse Cloud via the command line interface and all objects are created. Also dependencies between spaces can be created, e.g. if views in a consumption space should point to the remote tables in ingest spaces.
Below is a screenshot from the interface where the necessary information is stored.
Automatic processes should be monitored regardless. Via the CLI, the space definition can be queried after import into DWC and compared with the expected result (see screenshot below). In case of replications, the quality checks can be extended to a comparison of the source with the target table and similar tests.
With the project proved automation framework it is possible to generate a large data model within hours.
The Command Line Interface offers a high potential for complex scenarios and automation of activities. It is exciting to see how the functional diversity of the Command Line Interface will develop. Many scenarios for automation and mapping of complex scenarios can be approached this way. The example of the Automation Framework clearly shows that high savings potentials are possible by using the CLI:
- Generation of spaces
- Generation of data models
- Generation of users
- Alignment of source systems with the data models of the data warehouse cloud
- Execution of quality checks
I hope that the blog gives a good overview of the potential of the Command Line Interface.
Hello Christopher Kampmann,
Can you provide us more information on what is ISR automation framework, any websites for reference?
Hi Anto Madonna Shirani V,
we showed the tool in a webinar. Unfortunately the webinar was in german but you can access the replay on our DWC landing page.
Thank you, Christopher, for sharing your insightful examination of the automation potential within DWC using CLI. I'm curious, could you please elaborate on some of the advantages of using ISR over Jenkins for CI/CD pipelines as explained in https://blogs.sap.com/2022/10/11/sap-data-warehouse-cloud-sap-hana-cloud-hdi-automation-ci-cd-pipelines-details/?
Also, I noticed that you mentioned JSON merge is not supported in Git. While it's true that conflicts may occur during the merge process if multiple users have made changes to the same part of a JSON file, Git provides tools to help users resolve conflicts manually. These tools could also be integrated into the pipeline. Could you please clarify this point?
The approach of Vishwa Gopalkrishna is very interesting and a cool blog.
First of all we do not had the approach to replace Jenkins. In fact we are using Jenkins under the hood. However, I guess both blogs can't be really compared in terms of their scope.
In the project we needed an approach where we need to generate a lot of spaces, remote tables and views without using HDI or BAS. Therefore we used CLI only for the generation. We are using DWC standard only.
Focus of the ISR Automation Framework is a UI and configuration possibilities to do so together with the scripts for the generation of the JSON files etc. The UI is easy to maintain and everything can be used without a deep CI/CD knowledge.
The ISR AF generates several dependent spaces and all needed objects based on control tables maintained by business users. So in the initial phase of the project it takes over a lot of manual work, because it generates all remote tables and views which were needed.
In addition there are possibilities to perform quality checks against source systems if all columns of the source tables are also visible in DWC and if not enhance DWC automatically.
So the main focus is in the generation of DWC standard remote tables and views and speed up development with an easy to use UI.
Concerning the Git Merge:
We had issues with the part @DataWarehouse.querybuilder.model since it is a large string value.
Thanks, Christopher, all clear. I will keep dreaming about version control and GIT support in DWC 😉