What is a data warehouse and why do I need one?
While there is a wealth of material available about what data warehousing is, and how it should be done, I will attempt to give a very brief summary here in order to set the stage for explaining SAP Business Warehouse (BW and BW/4HANA).
A data warehouse is a source for integrated, reliable data for organizational reporting and analysis. Technically it consists of one or more databases and the tools and technologies to design, build, operate, maintain, and enhance that data source. At SAP we say a data warehouse is the DBMS + X where X consists of all the Enterprise Data Warehouse (EDW) services required for this full lifecycle management of the data source.
There are detailed lists and definitions of each of these services available widely, but I will briefly summarize the categories of services most critical to your EDW:
- Data Modeling: Designing and harmonizing logical and physical data models that effectively represent the domain for analysis and reporting.
- Data Integration: Acquisition, transformation, and harmonization of data from all required sources
- Process Orchestration and Scheduling: Scheduling and running all data load, preparation, enrichment and data quality processes.
- Process monitoring and correction: Monitor, pause, stop, restart, and rerun DW processes as required
- Data and Model Lifecycle Management: Both the data itself, and the models used to make them accessible must be maintained in the appropriate way so that nothing is lost, and the system runs efficiently.
- Data Governance and Data Quality: Data access, completeness, accuracy, freshness, and other aspects of the overall quality must be ensured and maintained.
Lately some have questioned the need for the data warehouse since platforms like SAP HANA provide the ability to do both transactional and analytical workloads simultaneously, but there are still a number of valid and important reasons why they are a best practice. Here are a few of the reasons data warehouse systems are still relevant and required by most organizations:
- Many/Most Transactional systems run heavy workloads and are considered mission critical. While these workloads can vary significantly, they are largely predictable and the systems are typically sized for peak workloads to ensure consistent performance. Data Warehouses tend to have more lumpy workloads with peaks at times when data is being loaded and queried. While the loading processes are generally controllable and predictable, the amount, complexity, and duration of query workloads is much less manageable. The result is that even with these new capabilities in the platform, more conservative organizations will want to keep the majority of the analytical workload and the transactional workload on separate systems.
- Maintenance and support for transactional systems is complicated and sometimes invalidated by modifications to the underlying data model. Including data from other sources in the transactional system almost always requires this, and is not necessarily a risk most organizations can afford to take.
- An EDW provides the ability to look at what the data looked like at a given point in time in the past (sometimes referred to as time travel scenarios). This is not possible with most transactional systems, or at least not easy to create.
- In addition to the need to harmonize the data models from multiple sources analytical use cases also require in many cases that the data itself be harmonized. While one system may use a set of codes for the state and country, another may spell them all out. Making changes to the data in transactional systems other than through the application can have additional risks with respect to consistent operations and governance, and is generally not advised.
- Best practices in data lifecycle management for transactional systems still recommend archiving and deleting data as it ages. These best practices are based on both technical reasons like maintaining the required level of performance, and business and legal reasons like removing data that can be a contingent liability if legal actions are initiated. Analytical requirements often dictate that extensive history be maintained in order to facilitate identification of trends and development of effective predictive and forecasting processes. In many cases the historic data does not originate in the same system currently in use, and therefore the data and data models do not necessarily coincide. Converting and loading large volumes of historic data is typically not recommended by providers of transactional systems, and has governance, risk, and compliance issues in addition to the contingent liability issue already mentioned.
What is SAP Business Warehouse and why do I care?
SAP Business Warehouse (BW) is a model driven application that provides all of the EDW Services mentioned above as an integrated solution that leverages your existing DBMS (SAP Business Warehouse runs on multiple DBMS platform including SAP HANA, and BW/4HANA is optimized to run only on the SAP HANA Platform), and generates physical data models designed to maximize the performance of the underlying DBMS. Conceptual and/or logical models of your information are used to create the physical models specific to your implementation.
In addition to the model driven aspects described above, BW provides process design and orchestration and monitoring capabilities, and error handling processes to efficiently run your EDW. The application and data lifecycle management capabilities allow you to create and operate multi-tiered landscapes that include development, integration, test, QA, and production environments, and manage the movement of objects between these tiers.
BW is unique in the marketplace in this approach to Data Warehousing. When asked “who is BWs main competition”, the answer is that there is no competition for this approach. However, there s of course competition. That competition focuses on a different approach alternatively referred to as the tools approach, the best-of-breed approach, the SQL Centric approach, or my favorite, the “roll your own” approach. This brings us to the why you care part. While this competing approach to data warehousing provides some advantages in terms of flexibility, we believe the BW approach provides lower Total Cost of Ownership, lower development time and expense, greater governance and reliability, and ultimately, greater agility as defined by the ability to respond quickly and efficiently to changes in the analytical data requirements. The pre-integrated approach also tends to provide greater reliability and resilience.
What does a Data Warehouse have to do with Planning?
Organizational planning processes cover many related, and unrelated domains. Organizations have to plan for capital investments, for Human Capital, for Sales, and many other areas, many of which have little in common. One thing that most or all of these process do have in common though, is the need for accurate information about the current and past state of these domains. This requires access to complete and reliable historic information, and often information about forecast and actual metrics. While this can be compiled directly from the operational systems in some cases, doing so is costly and time consuming. If an organization has an effective EDW system, much of this work has already been done, providing planning processes with the proverbial “Single Source of Truth”.
While many planning applications and solutions require that data be extracted from other systems and loaded into the planning solution, SAP has long believed that it can be more efficient to move the planning application to the EDW instead. Future Blogs in this series will explore the evolution of SAP’s EDW based and stand alone planning systems, but in either case the EDW is a key contributor to the planning processes.