The In-Memory Journey with Hasso Plattner – From Prophecy to Reality
Have you ever wondered what the world of enterprise systems looked like in the late 1960s? What transformed in the last 2 decades of the 21st century? What was once a prophecy of a renegade of our times, started a renaissance in the world of enterprise computing. Let me take you down the memory lane and through the in-memory computing journey from prophecy to reality with SAP co-founder and chairman Hasso Plattner.
“I have always lived in the world of these enterprise systems from day one on…” says Hasso in an auditorium of Hasso Plattner Institute, nestled in the quaint town of Potsdam while addressing his PhD students.
“200MB was a big drum like disk…” – Reminiscing the 1960s and 70s
“I joined IBM in 1969 and IBM was a master of enterprise systems in the world then. Because the transactional volume in a company those days was so large, the first objective was how can we condense data to be able to work with faster response times.” Reminiscing the year 1969 as the beginning of online data processing via screens and type writers, he continued to explain, “The number 1 technique was to aggregate data by setting some criteria – in accounting we aggregate data by account, in sales by customer, in purchasing by supplier, by goods. The whole purpose was to aggregate data. Once we have the data aggregated, the management information systems (MIS) then could deliver relatively quickly, answers to what was total value of stock, what was our open order book on supply side, what orders have we committed to, what was our P&L statement and what does it look like. In reasonable response time, we got an answer. This technique became possible because in 1968-69 timeframe IBM brought to market large disks for 70MB and then 200MB. 200MB was a big drum like disk so we could store reasonable amount of data on disk with direct access. So that’s the world we lived in from 1968 till end of 1970s. Those systems pretty much looked like systems we have today but less complicated. The basic idea was to get the transactions in, run some transactional processing, but the main purpose was to aggregate data for consumption later on by the MIS.”
The very essence of the then enterprise systems was carried on with the birth of SAP, with R/2 in 1979 and R/3 in 1992. “R/3 took the world by storm” and led the transition from mainframe concept to client server – the hallmark was the 3-tier client server comprising of an intelligent front end, an application server and a separate database server with load of system distributed quite evenly and providing a much higher quality of application, much faster application – but the governing principle concept remained same. “Transactional data in, run some transactional processing up to 20 stages managed by workflow and then aggregate data. Yes we continued to aggregate data and with the same criteria developed in late 1960s. If these aggregates were not good enough, we took the data to second storage called business data warehouse (BW) where customers could further aggregate data by different dimensions defined in reports which ran in batch mode. With the last move we removed the paper from enterprise systems.”
“Because I am a renegade and I wanted to do something different…” – The Beginning of SAP HANA
Towards end of 2006, close to 40 years after SAP was started, Hasso started speculating how to redo an enterprise system if he were to start from scratch and spent an entire day with his students to explain parts of how an accounting system worked then and what needed a redo. Hasso chuckles, “because I am a renegade and I wanted to do something different, I made a proposal to completely get rid of all aggregates, 40 years of aggregates, which was too radical a proposal then. I then made a proposal for all systems to take the time away from the line items and leave the day and aggregate. For every single line item, take the time away.” Hasso was convinced of his hypothesis that eliminating time of day would not result in any information loss from the transaction as in normal enterprise systems it shouldn’t matter whether something happened at 923 in the morning or 330 in afternoon. The challenge that he set for his students was to build a database to be so fast that we could build any aggregate the user community wanted on the fly within a reasonable response time on transactional data. The exercise to the students was to look at z processing, meaning for a given customer, looking down from the customer, get all orders, shipment, invoices, payment in a very short response time running sequentially through large database using secondary indices (what was already being done for 40 years). And what started as a research project marked the beginnings of SAP HANA. The dominating ethos was “Change the system fundamentally and yet use the traditional database technique.”
“When you restart thinking, the interesting thing is you might come to a different conclusion than you are used to…”
Hasso nostalgically continued to recollect and reflect on his realizations and revelations from that day in 2006 with his students – “If we build a system without pre-aggregates, do we really have updates in the system? Is the system really write oriented? Do we really need 2 different systems for enterprise computing – write oriented for transaction processing and read oriented for analytical processing?” Hasso was quick to conclusion on the same day – “It is not true that in an enterprise system, the behavior of transactional systems and behavior of analytical system are so fundamentally different. There is no reason for having 2 different systems. It is a myth that OLTP systems need write intensive and OLAP systems need read oriented.” Further revelations dawned. “If we remove aggregates, there should not be much update. If we remove aggregates, we can remove something else – can we avoid concurrency? And If we have no updates, there is no contention. If there is no contention, we don’t have to introduce any measures to avoid contention. We can run any transaction massively parallel – as parallel as a computer can run.”
Hasso had come a long way down the memory lane. From the last running R/2 system that shut down last year in the US (was running single threaded update) to systems in 2006 (that did not have enough computing power to get all the data in especially between 9-11am and response time was slow as transactions were locking each other) to imagining a database that can run as parallel as a computer can run and be an in-memory database using column store and not row store.
Fast forward to the current decade. “Memory is the new disk…”
With SAP HANA, SAP changed the enterprise computing landscape. “Memory is the new disk.” Database is in memory. Tables organized in columns. If table has 300 attributes, it has 300 columns. All attributes stored in columns. Columns scanned at high speed. Fully indexed. No more updates. Inserts done massively parallel. Program complexity reduced. No more aggregates. No redundancies. No indices. We use aggregate vectors as indices. And through compression alone, systems become smaller. 5x compression means 5x efficient system. Hasso recalls Father of MaxDB, Rudolf Munz, once an antagonist of Hasso’s research project eventually described Hasso’s object of pursuit as “relational column store fully in memory is a fully indexed relational database.”
“How do we multiply the tininess…”
Today, many of the customers are leveraging inherent innovations in SAP HANA to realize potential reduction in hardware costs as well as gain significant improvement in response times. The systems are tiny now but the pursuit for Hasso continues – “How do we multiply the tininess? How do we make it faster with zero response time?” He quickly postulates “horizontally, dynamically, automatically partition tables” which is now the fundamental concept behind data aging mechanism in SAP HANA which Hasso tirelessly advocates both in academia and real-world.
With SAP S/4HANA, through dictionary compression and data management techniques such as data aging, it has been proven time and again customers can reduce their data footprint between 4x – 20x. Data Aging is a mechanism of partitioning data into current (data that is needed to run your business and is in memory) and historical (data that is not frequently accessed and can remain on disk) and SAP offers data aging both for technical objects (idocs, change documents, workflow, application logs) and application (sales order, delivery, billing, purchasing etc). The split of data into current and historical data not only reduces the footprint of data in memory, it also makes the current data access extremely fast. The smaller the data footprint of current data becomes, the faster the scan and filter operations in SAP HANA will run.
We encourage our customers to take advantage of the innovations around in-memory data management techniques in SAP HANA. if you are already on your SAP S/4HANA journey, we want to be part of it. We want to help you age your data and realize the many benefits of an in-memory system. If you are planning on embarking on your SAP S/4HANA journey, with recently launched SAP HANA Hardware Sizing Service, our customers can immediately right size their SAP HANA environment.
The Pursuit is the Journey…
With Hasso and with SAP HANA, the pursuit is the journey. And we are just getting started. Hasso is quick to remind the golden rules:
“Never use select all.”
“Always specify exactly which fields you want to access.”