HANA is a solution platform that has enabled organisations to run complex business transactional and analytical operations upto 10K times faster than they did before. It would be interesting to know a few fundamental approaches that HANA uses.
The concept of 5 Dimensions of Performance presented by Dr Vishal Sikka (member of the Executive Board of SAP AG and the Global Managing Board, who led the development and delivery of SAP’s breakthrough – SAP HANA) is the true benchmark to measure the real performance of HANA.
According to Dr Sikka, in order to perform optimal, a data processing system should be capable of maximising each of the following 5 dimensions:
- Going Deep – allowing unrestricted query complexity
- Going Broad – allowing unrestricted data volume and variety
- In real-time – using most recent data for the analysis
- Within a given window of opportunity – quick response time
- Without pre-processing of data – involving no cost of data preparation
- Data size – larger the data slower the system gets
- Query complexity – more complex the query, longer it takes to return with the answer
- Rate of change of data – how quickly the system absorbs information when data is changed
- Is the data prepared or is it wrong – data preparedness before it can be used
- Response time – how quickly the system responds to answer a query
Science says human brain can carry out tasks (depending on complexity)
– interactively with continuous flow of thought when engaged for <1 seconds – 800 milliseconds
– efficiently when engaged for < 3 seconds
– starts losing attention when engaged for > 8 seconds
On the SAP HANA platform, enterprise system is capable of answering any query in < 3 seconds. The more adverse the above mentioned 5 conditions are, the better it performs. Large amount of data + Complex queries + Non-aggregated data + Real data that is changing. The more of these, the better HANA performs. This happens by virtue of the following innovative technological aspects of HANA:
- Multicore parallelism – The power HANA derives is from the fact that it runs massively parallel. A modern server has upto 80 CPUs, 2 terabytes of DRAM and 5+ terabytes of SSD as core. This combination would have a strong computing power. HANA further optimizes the utilization of all the operators. The modern concept is – the more you burn the CPU the faster it gets. Each CPU has roughly 3 gigahertz of clock speed which means total availability is (80 CPU*3) 240 gigahertz of clock speed and 2 terabytes of data in the DRAM. According to Dr Vishal Sikka, HANA is designed with “intra-operator parallelism” – which means all the major operators that do scans, calculations, join information can operate in parallel and hence increases the speed of operation altogether.
- It can complete 3.5 billion scans /sec /core
- 12.5 to 15 million aggregations /sec /core
- Columnar structure of database store – The main design of the central structure of HANA is its columnar store. This is HANA’s main invention. Row structures are traditional but in-memory row stores are able to do transactions quicker than before. With the columnar structure of the data store, data can be analysed extremely fast by picking up only those data selectively as required by the query. HANA’s database structure was designed from scratch. So all the latest possibilities were grouped in. When a transaction is run, it gets merged into the main database store through a process called Delta Merge – a delta store called L1 Delta (a variation of the row store) absorbs the transactions faster, sits as a buffer and fetches data from the main database store. Super-high performance architecture enables running queries very fast by the columnar operation while preserving the benefits of row stores, absorbs transactions with a high speed into the row stores as well. Analytics and transaction processes can be run in both row and column stores.
- Projection, Dynamic Aggregation and Compression – HANA’s tremendous scan speed enables it to follow the Principle of Minimal Projection. Only those data that are needed are grabbed from the database store and processed. In the past, due to slow systems, data used to be collected first, stored in the application and then processed which resulted in batch jobs. With minimal projection, HANA does transaction and analytics dynamically on the fly. Same thing applies to aggregation. Whenever aggregation is needed, it can be calculated at the speed of 12.5 – 15 million/sec/core dynamically. And if data has not changed, we can cache the results and answer the second time around more quickly. But since HANA aggregates dynamically and extremely fast, calculations and aggregations need not be stored in the database at all. Only the raw data is stored. Moreover, the columnar structures help to store the raw data in a more organised way, compresses data storage, saving memory space and helping select only those data that is required to process a query.
Besides the above, another capability of HANA – the ability to Partition and scale out data across different parts of the memory on one server or multiple servers helps to scale up performance and distribute workloads across machines and enhances speed.
In this post, I intended to put together only those technological attributes that enable HANA’s gigantic speed. There are other magical attributes of the HANA platform that eradicates data latency and duplication, able to build custom applications, enhances data storage, quick innovation capability and adaptability that reduces TCO and increases business value.
Hope this document makes a good reference point.