In my previous article I mentioned that HANA was a major enabler of Pervasive BI, as it allows for real-time data to be brought into the Data Warehouse.
In this article I would like to explore some technical options on how this can be done in practice, i.e. how real-time data can be brought into the Enterprise Data Warehouse to be analysed together with historical data.
BW powered by HANA – LSA++
Before jumping straight into the architecture of real-time data into the Enterprise Data Warehouse, I will first briefly introduce the architecture of BW powered by HANA: LSA++.
Most traditional Enterprise Data Warehouses are built using a Layered Scalable Architecture (LSA) – this architecture makes the EDW a more robust, scalable and flexible system, that can cope with changes in business, technology and requirements. LSA is built with several layers, and data is physically copied from one layer to another. It is not uncommon for traditional data warehouses to have three copies of the same data in each layer: Acquisition, Transformation, Data Mart.
When we implement BW on HANA, we can create a different flavour of Layered Scalable Architecture, much simpler, smaller, faster and that relies more on virtual objects and less on copy of data between layers – It is called LSA++.
Of course, an Enterprise Data Warehouse built on BW on HANA could be done the traditional way, but then it would not take advantage of many of the benefits that HANA’s in-memory capabilities bring.
In LSA++ the first layer is called “Open Operation Data Store Layer”– or Open ODS Layer. It is equivalent to the original LSA’s Acquisition Layer. Data is stored at field level (raw data) exactly the same way as it is in the source system. Data can be stored in Field-based DSOs (modelled in BW) or in HANA tables, accessible to BW through HANA views. Data can be extracted using scheduled extraction (the same way as in BW without HANA) or via real-time data replication, using SLT (SAP Landscape Transformation). Both DSOs and HANA tables can be queried directly.
The next layer is the “Core Data Warehouse Layer“– or Core DW Layer. Data is still at line item level (not aggregated). At this level data can be transformed, cleansed or consolidated. Data is stored in Core DW DSOs that can be used by data marts in the next layer. This layer too can be queried directly.
The next layer is the “Virtual Data Mart Layer“. Due to HANA, the structures on this layer are virtual structures and can combine data from all the other layers. These structures are used as query targets for reporting. InfoCubes, with physical data stored, become obsolete. Virtual providers represent more flexibility and agility.
Real-time data in the Enterprise Data Warehouse
Now that we have some understanding of the architecture of BW powered by HANA, we will add real-time scenarios to it.
It all depends on how data is stored in the source system. If the source system uses traditional databases (not HANA), we can use SLT to replicate data, real-time, into HANA tables in the “BW powered by HANA”.
This is done real-time or near real time. Effectively, real-time data makes its way into the EDW in minutes, or seconds after the event – not hours as in traditional data warehouses.
If the source system uses HANA as a database, then there is no latency at all. Data can be analysed directly in BW using Virtual InfoProviders (which read data directly from HANA) or through HANA views built directly on the source system. Data is made available real-time without having to go through the extraction and transformation within the EDW. This eliminates the need for replication, latency and additional storage of data in BW.
Benefits of bringing real-time data into the EDW
The adoption of LSA++ changes the way we do things:
- We now have the ability to analyse real-time data without the overhead of building through the different semantic layers of the EDW
- Data can be replicated in real-time without having to go through the extraction and transformation of data within the EDW
- Big Data (large data volumes) analysis can be done in real-time using both EDW (historical) data and the most recent transactional postings
By combining real-time data and historical data into one warehouse, the classic Pervasive BI applications become possible. This can bring phenomenal business benefits. Some examples are:
- Retail: Analysis of Point of Sales Information. With information from various sources, hundreds of thousands of SKUs, several Distribution Centres, and access to real-time data and historical trends, analytics applications can identify items at risk of stockout, and propose corrective actions before they actually occur. This can be done at any point of the supply chain. Additionally, impact of promotions could be measured in-loco, as sales occur
- Airlines: Operations staff at the hubs can monitors on-time performance throughout the day and make operational decisions about catering, personnel and gate traffic flow. Pricing specialists can track, real-time, the impact of price changes on reservations and make adjustments that optimise revenues
- As a client places an order, the level of discount and payment conditions are adjusted in real-time, based on the analysis of payment history, previous orders (even those entered just seconds ago) and open invoices. This is competitive advantage.
- Electricity retailers can calculate buy-back price based on history and up to the minute consumption information – even using very large volumes of data.
Of course, bringing real-time into the Enterprise Data Warehouse creates its own challenges. What is real-time data today will become history tomorrow, and could get loaded again in the overnight load (assuming one exists). A mechanism to separate real-time data from historical data must be put in place. Otherwise, there is the risk of having duplicated data.
The ability to bring real-time data into the Enterprise Data Warehouse has never been so real. The adoption of LSA++, SLT and Virtual Providers make it possible to combine history and real-time in the same analytics applications. This technology is available today.
This creates possibilities for applications that were not possible before, for solutions that truly support decision making based on the latest, most accurate information.
This is true disruption.