Agile Data Warehousing with SAP BW/4 HANA
Agile Approaches and organisations have been very present in the media for a long time and have been used in Software-Development for a long time.
We encounter the result of agile software development every day when using Amazon, Spotify, but also SAP cloud solutions such as the SAP Analytics Cloud. The short time until new features are available shape the expectations of users of other software solutions. Agile methods of software development are less widespread in the development of SAP data warehouse solutions. It is argued that in the data management area it is not possible to develop small usable product increments, and that agile development methods are therefore fundamentally out of the question. In this article I would like to explain why methods of agile software development are also useful and possible in the data warehouse. Here I concentrate on SAP BW / 4HANA and highlight which properties are important for an agile design of the data warehouse.
Disadvantages of a classic Data Warehouse development approach
Many data warehouse solutions are developed using methods that are essentially based on classic waterfall procedures – albeit partially loosened up by iterative approaches.
The world in which BI systems were described exactly in advance in a business blueprint and then developed over several months is over. If in doubt, the requirement has changed again since the blueprint was created and the users receive a feature that is not required – then the sentence “Works as designed” applies. However, this does not help either IT or the departments. In highly regulated markets such as banks, it may be necessary to demonstrate compliance with regulations through extensive concepts, but this should not be necessary for many companies. The danger is that solutions are not needs-based and are therefore not used. Qualified feedback and user experience are available after testing or after productive use at the earliest. Often this is the only way to formulate the requirement. Classic methods end here.
In the past, waterfall approaches have also been favored by a technology in which subsequent changes were difficult to implement. In the past, it was always a problem in the SAP BW environment when major data model changes were to be made. The primary goal was to ensure a high level of stability and quality. The structured data from the operative ERP systems were prepared in layers and saved redundantly as a basis for consumption by BI front-end tools. An architectural approach for systemic mapping that is widespread in the SAP BW world is the LSA (layered scalable architecture) model.
The processing of data in layers as part of the LSA concept ensures a high level of stability and quality. On the other hand, however, the persistence of the data in several layers creates numerous redundancies. This, and the very scientific approach of complete top-down modeling (common in old SAP BW systems) lead to a monolithic architecture. Adjustment requests quickly lead to high development costs. The system is therefore neither flexible with regard to data model adjustments, nor does it support incremental further developments. Also, before HANA, BW was not exactly known for efficiently processing very fine-grained mass data. As a result, it was necessary to plan precisely what level of detail of data is required and what aggregations must take place along the data flow. As a consequence, this means that with classic SAP BW approaches, development without extensive planning of the data model and preliminary work (e.g. mass creation of info objects) was not possible.
Systems have grown that are of high quality and stability. At the same time, however, the systems have problems reacting flexibly and quickly to changed requirements. In my experience, difficulties arise with SAP BW systems after many years of operation. The inflexible data modeling leads to long development times as well as increasing maintenance effort and costs. This quickly leads to a lack of acceptance among users and favors local departmental solutions as a way out of the inflexibility.
Agile values and principles
The “Manifesto for agile software development”, which contributed to the triumph of agile methods, is very well known in software development. The starting point for further deliberations in our contribution should be the visualization of the values and principles of the agile manifesto, which we have slightly adapted in order to establish a reference to data warehousing and BI.
The main thing is that the focus is on the benefits for users. Iterative, incremental development processes are intended to increase quality and benefits for users. The “customers” of the data warehouse should be provided with quickly usable features / product increments. On the one hand, users should benefit as early as possible, on the other hand, the feedback should flow into the further development of the next iteration at an early stage.
The DevOps method is widely used in agile software development. In this context, the ability to carry out 10 releases per day is often referred to boldly. Of course, nobody in the DWH context wants to carry out 10 releases per day. The effort of a release – from planning to deployment – should be minimized. A well-known analogy is the dishwasher, which only needs 3 minutes for a wash cycle – you will still not perform 100 rinses. The planning and effort for a flushing process are reduced. Earlier feedback from users is possible, can be taken into account faster in the next iteration & made available again. There is a cycle. It is easier to avoid undesirable developments. The benefit for users increases because IT solutions are more closely geared to requirements.
At least now it is becoming clear that we are talking about more than modernizing IT systems. Agility results from the successful synthesis of agile mindset, processes & methods as well as suitable technical solutions.
The focus of this article however is less on the cultural or organizational aspects. We would like to highlight aspects that are important for an “agile design” of SAP BW systems.
Success factors of an agile Data Warehouse
From the values and principles mentioned, it quickly follows that data warehouse systems should no longer be planned and changed in large releases. Instead, the data warehouse changes incrementally, i.e. evolutionary. The feedback from users is obtained more quickly and can be taken into account in the next increment. This takes into account the basic ideas of the agile manifesto. The data warehouse can be developed closer to the user.
For a data warehouse to enable this type of development, the design of the data warehouse must meet certain requirements. I see three success factors for the implementation of such an “agile data warehouse” with SAP BW.
Development based on requirements
In the past, SAP BW solutions were often developed more broadly than was necessary for a given requirement. Even if a user only needs three fields from one source, the other 97 fields have also been integrated into SAP BW. This follows the idea “What you have, you have”. Data warehouse solutions should, if possible, be developed in such a way that many as yet unknown questions can be answered. For a long time, subsequent data model changes in classic SAP BW systems could only be carried out with great effort. Therefore, the effort was better invested straight away. The disadvantage of this approach is that users wait unnecessarily long and the feedback on a development will also take a long time. The whole process of development is slowed down. In practice it seldom happened that the 97 extra fields were really needed. Instead, the development of the data warehouse should be carried out in accordance with the requirements, i.e. that planning and implementation are based on the actual requirement and are not unnecessarily inflated. Users should receive features that can be used as quickly as possible. All further developments are only carried out if no time delays are to be expected. Avoid wasting time and money.
If you do not plan for weeks, you consciously accept that existing data models will change over time. So that this does not lead to major problems in the operation of the data warehouse, the technical design of the data warehouse must enable simple refactoring. Otherwise a requirement-based DWH development is hindered.
The architecture and governance specifications have to do a balancing act. There must be central guidelines for development so that quality standards are adhered to and not every developer develops the data warehouse according to their personal preferences (e.g. naming conventions, layer architecture, etc.). Otherwise the system is not permanently operational. At the same time, the architecture must allow enough freedom so that the data warehouse can be developed as required and not dogmatically. From today’s point of view, a negative example would be a mandatory layer-by-layer preparation as with the LSA architecture approach, which is widespread in classic non-HANA SAP BW systems.
Agile design of a SAP BW/4 HANA Data Warehouse
How can SAP BW / 4HANA meet the success factors? What is important? We see four things that are important when designing a SAP BW / 4HANA-based data warehouse and form the basis for agile development.
The choice of architecture is of central importance. For SAP BWonHANA or BW / 4HANA systems, SAP recommends an architecture that is based on LSA ++. This new layer approach is particularly characterized by the focus on virtual objects. Virtualization means that fewer copies of data are required between layers of architecture, making the architecture simpler, smaller and faster. The data warehouse architecture should be designed as lean and simple as possible. The drivers of the data warehouse service level and the necessary layers are the business requirements. The data warehouse is developed as required.
For the first time with SAP BWonHANA it was to create a field-based InfoProvider. This bottom-up modeling accelerates developments because the semantics do not have to be described before data integration and analysis. The possibility of partially automated creation of data models and data flows (aDSO / openODS View) also accelerates development. Initial analyzes are possible after just a few “clicks”. This enables users to perform initial (simple) views of data very quickly – often after just a few minutes.
The enrichment of field-based data models with master data can easily be done via the composite provider. Thanks to the field-based modeling and virtual association of master data, a SAP BW can be developed much faster and more tailored to your needs. The bottleneck of a BW project, the mass creation of info objects, becomes optional. In this respect, field-based modeling can be a significant project accelerator.
If there is a need to swap fields for info objects at a later date, this is possible through remodeling jobs.
Dimension Satellites & Snowflaking
Navigation attributes of InfoObjects can significantly enhance the analysis of transaction data. In SAP BW systems without SAP HANA, a setting had to be made in data store objects or InfoCubes in order to use navigation attributes. This resulted in longer activations of the objects. The flexibility and simplicity of customization was limited in the “before HANA” era.
With the introduction of the composite provider, a complete decoupling of semantics and persistence is possible (flexible dynamic star schema). Master data (regardless of whether it is represented virtually by OpenODS views or persistently by InfoObjects) can be “associated” purely virtually in the composite provider. Settings on the composite provider are sufficient for the virtual enrichment of transaction data. The flexibility is additionally increased by the possibility of associating more than one InfoObject or openODS View with a field in the transaction data. For example, an OpenODSView “Order attributes technically virtual” and an info object “Order attributes technically persistent” can be associated with an “Order” field at the same time. The flexibility is further increased through the use of transitive attributes (snowflaking).
The flexible dynamic star schema is initially only a technical characteristic of SAP BW. In “cooperation” with design decisions, the flexibility of the BW system increases considerably. Two aspects should be emphasized. The field-based modeling (already explained) and the modeling of dimension satellites.
The data modeling in SAP BW changes due to these new possibilities. In classic systems, master data – even if they originate from different tables and systems – had to be consolidated into an InfoObject if you did not want to expand the transaction data. This has led to large, inflexible info objects. It is now possible to decouple master data and model it as a modular satellite dimension. A satellite can represent a source table in a source system, for example. This results in smaller, physically independent, more flexible units that can be changed independently of one another and independently of persistent movement data. The model can also be iteratively expanded to include additional satellites, which can be displayed persistently (info object) or virtually (openODS view) on a case-by-case basis.
Sometimes it is useful to map central business entities in one object instead of many individual info objects or open ODS views. In these cases, a virtual join of the satellites can take place with the help of a calculation view and provision in SAP BW via an openODS view. The purely virtual mapping of the business entity means that you don’t lose any flexibility, but you can also map master data logics if necessary.
Satellites for movement data and virtualization of data flows
What applies to master data (dimension satellites) should also be checked for movement data. If business logics and joins are persisted in new aDSOs, you always lose flexibility and speed of development. A simpler refactoring of data models is possible if business logics and joins either …
- Can be mapped virtually via HANA calculation views or composite providers or
- If possible, encapsulated in data marts. Only additional information from calculations is saved, but not again the data from all participating info providers. These can be connected via a composite provider, for example.
The duplication of data is to be avoided. The principle “Virtualization as much as possible, persistence only where really needed” applies. Changes can be made quickly and flexibly thanks to this design. In the case of full virtualization, users immediately see the effects of data model adjustments. Refactoring is also easily possible because there are no or hardly any restrictions due to persistent data tables.
In practice there will of course be cases in which this principle cannot be applied. Obvious are, for example, performance reasons or the (broad) historization of data. In the end, it is a consideration that has to be made in the specific individual case. However, we advocate “right of way” for the more flexible modeling approach.
Long development cycles, inflexible data models, very high operating costs and a lack of user orientation are widespread criticisms of SAP BW systems. From a technical point of view, SAP BW / 4HANA offers many more options for developing a data warehouse that is user-oriented and needs-based compared to previous (nonHANA) SAP BW systems. A consistent “agile design” of SAP BW / 4HANA systems is necessary. In order to show the potential of agile modeling with SAP BW / 4HANA, we will use an example to show in one of the coming articles how a SAP BW / 4HANA can be modeled in order to adapt quickly and easily to changes. In addition to the design of the SAP BW, one must be aware that the development process and the organizational culture must also allow agility. We see the opportunity to shorten development cycles significantly and to improve the acceptance of SAP BW systems.
This article is also accessable in german: Link