Tools of HANA SQL Data Warehouse
In my last blog post I gave an overview, which properties characterize the HANA SQL Data Warehouse approach, what it makes special, What characterises the HANA SQL Data Warehouse? | SAP Blogs.
Now with some weeks delay I want to continue and give an overview which tools are necessary or helpful to build a SAP HANA Data Warehouse with in this way.
SAP HANA SQL Data Warehousing offers a high degree of flexibility with regard to the use of different tools. The SQL approach makes it possible to use many well-known (open source) software development tools, especially in the context of the applied DevOps philosophy. But also the SAP HANA platform brings some important own tools for the development of the native SAP HANA SQL Data Warehouse.
The SAP HANA platform has various tools for managing and developing the platform. For building the native SAP HANA SQL Data Warehouse, these are of importance accordingly.
1.1 Data Integration
For a data warehouse, the integration of data from different source systems is a core task. After many years of classic extract, transform and load (ETL) dominating this area of data warehousing, the current trend is to transfer data only virtually into the data warehouse. In addition, new NoSQL database technologies and Big Data are playing a larger role and need to interact with an enterprise data warehouse solution. The SAP HANA platform as the basis of SAP HANA SQL Data Warehousing, which is native in this sense, has many solutions that enable on-demand and scalable data integration. For simple loading processes, the SAP Web IDE development environment (see 1 SAP HANA Tools below) is already sufficient, with which tables from source systems can be filled graphically on the basis of SQL commands. For more complex scenarios, the Enterprise Information Management (EIM) package solution and the SAP HANA Smart Data Access (SDA) service are available for integrating data.
1.1.1 Enterprise Information Management
For some years now, Enterprise Information Management has been a separate branch of information technology that deals with the consolidation and the management of structured and unstructured information in the corporate context. SAP has combined specific solutions for data management and data integration under this name. For data integration, these are the tools SAP HANA Smart Data Integration (SDI) and SAP HANA Smart Data Quality (SDQ). SDQ is an extension of SDI that cannot be used on its own. SDI is the universal ETL tool of the SAP HANA platform that can be used to connect a wide range of data sources and load data into the SAP HANA database, after transformation if necessary. Standard connectors include:
– SAP Business Warehouse (SAP BW)
– Database management systems (including IBM DB2, Oracle, Microsoft SQL Server, Teradata)
– SOAP and OData
– Appache Hive
– Microsoft Office
– Files (e.g. Microsoft Excel)
– Facebook and Twitter
SDI is fully integrated into the SAP HANA platform and can be managed through the SAP HANA Cockpit. To use it, all that is required is to install a Data Provisioning Agent on the source system and connect it with the corresponding adapter (Figure 1.1). The SAP Web IDE can then be used to create so-called flow graphs for data replication tasks, for example. SDQ helps in this process with data cleansing. In this way, data formats can be defined in a simple manner, duplicates can be identified and further specific regulations can be made to improve data quality.
SAP HANA Smart Data Access (SDA) is the tool that enables data virtualization and makes persistent replications on the SAP HANA database obsolete. In the context of SDA, SAP therefore also speaks of a data federation technology, since the data remains in the source systems and a distributed data landscape with central access can thus be set up. SDA is essentially based on the ODBC (Open Database Connectivity) database standard, which means that all databases that support this standard can be used for SDA connections. In addition to all common database management systems, these currently include Google BigQuery, Apache Hadoop and Apache Spark. Once a connection has been created in this way, the data can be retrieved in the SAP Web IDE development environment in virtual tables that adopt the metadata of the source tables. This provides the same functionality as persistent tables, with the possibility of creating synonyms, functions and procedures.
1.2 SAP HANA XSA
SAP HANA Extended Application Services Advanced Model – XSA for short – is the integrated application server of the SAP HANA platform that provides both the development and runtime environment for HANA applications. As the name suggests, it is an evolution of the original variant, SAP HANA XS, which was added to the product just two years after the SAP HANA database was released and began the transformation to a platform. The basic idea has not changed since then. On the SAP HANA platform, the XSA application server and the SAP HANA database are located directly next to each other. This not only creates a development environment, but also brings data processing closer to the data in this architecture. This avoids the mass transfer of data between the database and application layers, which could result in speed disadvantages. In this respect, the SAP HANA platform represents a paradigm shift from the prevailing three-layer architecture to a two-layer architecture in which even complex calculations are taken over by the database layer, thereby increasing performance and efficiency. SAP HANA XSA is therefore of crucial importance for the tools described below, since the structural prerequisites of the tool functions are created with the application server. Compared to its predecessor, SAP HANA XS, the key difference is that XSA has seamless integration with SAP Cloud Platform’s Cloud Foundry architecture. This makes it possible to develop on-premise and immediately run the developments in the cloud.
SAP Web IDE is a browser-based development environment with which applications can be created full-stack, i.e. including user interface (UI), application logic and database artifacts. In this end-to-end development process, the entire lifecycle of an application can be accompanied with the SAP Web IDE. In addition to the ability to develop, the environment also supports the debugging, delivery and testing of software. SAP Web IDE runs on-premise on the SAP HANA XSA application server or is part of SAP Cloud Platform. Therefore, no additional installations are required to use it. For building the SAP HANA SQL Data Warehouse, SAP Web IDE is the central tool for developing the database artifacts. The development environment includes a number of graphical editors that can be used to easily prepare models, data logics and data streams, such as the SAP HANA-typical Calculation Views (Figure 1.2), Flowgraphs or Core Data Services. In addition to the graphical preparation, there is also the option of direct coding in the SAP HANA database language SQLScript.
SAP HANA Cockpit is the web-based administration tool for the SAP HANA platform, which can be used to perform classic administration tasks for the SAP HANA platform. Among other things, the entire SAP HANA system landscape as well as individual systems such as Tenant databases or HDI containers can be configured and monitored. System configuration includes the management of resources, services and performance. Another important aspect is the management of security measures. Settings for encryption, system maintenance and backups can be made via the SAP HANA Cockpit. In addition, a sophisticated role and user management is available.
The SAP HANA Database Explorer links elements that are of interest from a developer and administrator perspective. It is therefore integrated into both the SAP HANA Cockpit and the SAP Web IDE. The Explorer contains the database catalog in the form of a tree structure through which the contents of the various schema or HDI containers on the database can be explored and contents can be imported and exported (Figure 1.3). In addition, an SQL Console is available for making data queries and there are options for debugging and checking log files.
SAP PowerDesigner is a tool from the field of computer-aided software engineering (CASE), in which IT-supported tools are used to design software applications and implement them as automatically as possible. SAP Powerdesigner creates solutions for the graphical analysis and modeling of business processes and their conversion into various data models. SAP HANA SQL Data Warehousing relies heavily on corresponding modeling processes and places them at the center of the construction of data warehouse structures. SAP PowerDesigner is therefore an important tool in the context of model-driven SAP HANA SQL data warehousing, although it is not part of the scope of the SAP HANA platform and must be licensed separately. The functional scope of SAP PowerDesigner is basically divided into data architecture, information architecture and enterprise architecture. However, for the SAP HANA SQL DWH, the area of data architecture with various data modeling techniques, such as conceptual, logical and physical data modeling, is particularly relevant (Figure 2.1). Of particular importance is that different models and model types can be fully integrated by the so-called link-and-synch technology. This feature, which is summarized under the keyword data lineage, makes it possible to perform impact assessments of changes across the different models and to make the entire data modeling process transparent. Updates to the database architecture are thus directly reflected in the corresponding upstream data models and vice versa. Accordingly, both a forward-looking development process, leading from conceptual considerations to the physical data model of the database, and a backward-looking process are possible.
SAP PowerDesigner ist ein Werkzeug aus dem Bereich computer-aided-software-engineering (CASE), in dem es darum geht durch den Einsatz IT-gestützter Werkzeuge Softwareanwendungen zu entwerfen und möglichst automatisiert zu implementieren. SAP Powerdesigner schafft hier Lösungen für die graphische Analyse und Modellierung von Geschäftsprozessen und ihre Umsetzung in verschiedene Datenmodelle. Das SAP HANA SQL Data Warehousing setzt stark auf entsprechende Modellierungsprozesse und stellt sie in den Mittelpunkt des Aufbaus der Data-Warehouse-Strukturen. SAP PowerDesigner ist daher ein wichtiges Tool im Rahmen des modellgetriebenen SAP HANA SQL Data Warehousing, obwohl es nicht zum Umfang der SAP-HANA-Plattform gehört und separat lizensiert werden muss. Der Funktionsumfang des SAP PowerDesigners teilt sich grundsätzlich in die Bereiche Daten-, Informations- und Unternehmensarchitektur. Für das SAP HANA SQL DWH ist jedoch vor allem der Bereich der Datenarchitektur mit verschiedenen Datenmodellierungstechniken, wie konzeptionelle, logische und physische Datenmodellierung relevant (Abbildung 2.1). Von besonderer Bedeutung ist dabei, dass sich verschiedene Modelle und Modelltypen durch die sogenannte Link-and-Synch-Technologie voll integrieren lassen. Diese Eigenschaft, die unter dem Stichwort Data-Lineage zusammengefasst wird, macht es möglich, Folgenabschätzungen von Veränderungen über die verschiedenen Modelle hinweg durchzuführen und den gesamten Prozess der Datenmodellierung transparent zu machen. Aktualisierung der Datenbankarchitektur spiegeln sich so unmittelbar in den entsprechend vorgelagerten Datenmodellen wider und andersherum. Dementsprechend ist sowohl ein vorwärts gerichtetes Entwickeln, das von konzeptionellen Erwägungen hin zum physischen Datenmodell der Datenbank führt, als auch ein rückwärts gerichteter Ablauf möglich.
SAP HANA SQL Data Warehousing relies on DevOps in terms of processes and makes the advantages of this philosophy available for the development of the data warehouse. Figure 3.1 shows the typical phases of a DevOps cycle with the connection of the areas Development/Designtime and Operation/Runtime, as well as the higher-level continuity processes. These work in particular toward the highest possible degree of automation of the processes. This is supported by specific DevOps tools, which we present to you in an overview specifically for SAP HANA SQL Data Warehousing.
Git is a distributed version control system that can be regarded today in the field of agile software development as an industry standard for the joint management of source code by development teams. Such a repository is also of central importance for the DevOps philosophy with the processes of Continuous Integration, Continuous Testing and Continuous Delivery. Git technology is offered by various commercial providers, such as GitHub, GitLab or BitBucket. All these variants are very easy to integrate into the SAP Web IDE and can be used to build the SAP HANA SQL Data Warehouse. Pure cloud but also on-premise solutions are possible.
Continuous integration, continuous testing and continuous delivery (CI/CT/CD) are among the core processes of the DevOps philosophy, which merge seamlessly into one another. In the DevOps cosmos, there are a number of tools for these processes that support the development of software, or in our case a data warehouse, on the way to a higher degree of automation. These mostly interact strongly with the already mentioned Git repository. The availability of Git in combination with the universal SQL standard of the SAP HANA platform lead to the fact that there is basically no restriction regarding the selection of specific tools. In our projects, we have had good experience with common tools such as Jenkins or Bamboo when setting up an automated deployment pipeline.
1.3 Issue Tracking
Another category that provides tools for the successful practice of DevOps is issue tracking. Appropriate tools are used to track issues in the software development lifecycle. An issue or problem can be anything of relevance, such as bugs, errors or even specific functions and features that are not yet fully developed. With the issue tracking tools, these problems can be recorded, evaluated and assigned to persons. In some cases, this is even possible automatically in conjunction with the tools in the CI/CT/CD area. In DevOps, this work takes on a high priority under the keyword Con-tinuous Feedback. User feedback in particular must be incorporated into the development lifecycle again and again and processed cleanly. Here, too, there are a number of tools that come into question. And here, too, there is basically no restriction.
In this blog, i have provided you with an overview of the essential SAP HANA SQL Data Warehousing tools. In addition to some native tools of the SAP HANA platform, there is a selection of other third-party tools available. And this is one of the great strengths of SAP HANA. With this platform we can integration non-SAP tools, e.g. open source tools like Git and Jenkins. Especially for the support of DevOps processes, the list of helpers and automation promises is long and initially a liitle bit confusing, especially for users who, due to their previous work in the data warehouse environment, have not yet had much contact with DevOps and methods of agile software development.
And one last thing: All tools and methods work in combination with HANA Cloud, except XSA. On HANA Cloud the integrated application server is Cloud Foundry, which is absolutly compatible with XSA. So using these tools and methods we are Cloud-ready.
I will give soon more info blog posts on the topic of HANA SQL Data Warehousing. I hope, you enjoyed this blog and it was a little bit helpful.