SAP HANA is a revolutionary database technology. I describe it as revolutionary since it has been created to eradicate the pains of maintaining both the OLAP (Online Analytical Processing) and the OLTP (Online Transactional Processing) database separately. It changes the way analytics has been thought about. In the case of huge amounts of data, analytics were done on the pre-aggregated dataset. This means we were limited in questioning the database, since the model or cube created using the previously thought data-structure no more support the new dimension of querying. Making any change to cube meant waiting for servers to update and create the analytical database again for querying. Querying on the transactional tables with millions/billions of records and getting the response in seconds is revolutionary.
With HANA end user no more require a separate database for analytics and transaction. All analytic operation can be performed on the same transactional database. This means one can query the database with any field or column and seek answers at runtime. This is the new way of thinking in In-Memory database; one can build fast applications on mobile or iPad devices and have giga/tera byte of information being analyzed on the HANA Box. The best part is that you have lowered your cost of IT by reducing the hardware and software by having just a single database.
In this new way of thinking user removes a lot of work of creating cubes or pre-aggregations and provides the unlimited flexibility during querying. It is simply amazing to see the analytical query considering / incorporating even the last transaction created/updated few milliseconds ago.
HANA software & hardware components:
- HANA Box – New In-Memory Relational Database – In Memory database, which support both column & row based tables.
- HANA Studio Software– This is used by administrator and programmer to interact with the HANA Box. Studio is the GUI for HANA Box to create the Models. More of which is described below.
- HANA Client Software– This provides the appropriate drivers for various languages/application-tools to interact with HANA Box.
Modeling in HANA:
HANA Studio lets user to create database via the eclipse based Graphic User Interface, it provides and also supports users to create the views/model such as attribute, analytical & calculation. These models/views enable programmers to build & execute data intensive logic inside HANA Box and gain higher performance at application level.
These models are stored in the tables of HANA In-Memory database and can be executed via the external programs using the appropriate HANA drivers provided by HANA Client.
Simple HANA Box would be a multi-core (64 to 1000 CPU’s) machine with huge amount of RAM (main memory) 100’sGB to 2 TB of memory.
HANA is high performance database due to following reasons:
- Memory hierarchy: HANA uses the main memory and with column store optimizes the chances of cache hits. Performance and cost grows while you climb the memory hierarchy pyramid as shown in the picture below. HANA database completely eliminate the bottleneck of Disk level latency. HANA process and store all data in RAM, so the traditional problem of disk I/O is not any concern. Shown in figure below ‘Memory Hierarchy’. HANA optimizes the use of memory hierarchy to improve the performance.
- RAM (Random Access Memory) availability at reduced cost. With reduction and availability of RAM in GB’s to TB’s at affordable price, it is possible to utilize simply RAM for store & processing information with the backup of Solid State Disk or disk drive.
- Multiple Cores’ (CPU) Architecture: It enables HANA to parallel process the query request.
- Columnar tables: Storing the table information in columns improves the analytics, since in aggregation we mainly perform sum or average or count in a column. This becomes faster in column based storage, since values are adjacent in memory and latency of loading in cache is reduced.
- Partitioning: Process of splitting the queries or dataset into multiple segments. This is done to simplify the problem and process them in parallel to utilize efficiently multiple-cores of server.
- Parallelism: Parallel processing of query or working parallel process using partitioning. With the multi-core CPU this has been used effectively.
- Compression: Traditionally enterprise databases are huge and run into terabyte. HANA utilizes the compression techniques at column level to compress the data. Various techniques such as Dictionary Compression, in which a unique string of column data is assigned an integer value and stored as an integer in transactional records, thus reducing the storage. Dictionary is also stored separately in the HANA In-Memory database.
Diagram of Memory Hierarchy
HANA utilized the combination of Hardware and Software innovations to deliver the results in milliseconds. It capitalizes the hardware innovation such as memory hierarchy, multiple cores running parallel to process the request in the GB’s of RAM. HANA fetches all data from RAM level and does not depend on disk data for processing. The common question which comes on one’s mind is, what happens if the machine turns off, do we lose the data? The answer is no, we don’t since it does store or log each transaction at the disk level too. This is utilized in the event of power failure to re-create the snapshot for the HANA Box after it is re-boots.
With such a high performance database in hand, we need to get back to designing the application and look into the option of utilizing the power of HANA. This is done via the modeling such as Attribute View, Analytical View and Calculation view in HANA. Using these modeling views a programmer can utilize the Standard SQL and build the logic which deals with huge amount of data and pass the results to client application or in HANA tables. The programmer can also design an application which has all data intensive logic in SQL store in HANA Box and can be accessed by business application via HANA Drivers.
In the previous blog (An experiment of Android with HANA In-Memory Database) of a Retail scenario analyzing the ‘refund’ on products showing its performance across locations. I have outlined a huge potential with respect to improving the performance of business application via the following approaches:
- In the existing application, change database with HANA and see the performance improvements. Furthermore transfer the data intensive logic within HANA using HANA model or view.
- Replace the database of Analytical application with HANA In-Memory database. This will simplify application and gives flexibility to perform new queries.
- Build new applications which were not possible before, due to huge amount of data & performance bottlenecks.
HANA in few cases can change the way business is done today. With the speed at which user can get answers and do the data mining of information, we can slowly see the shift of business process to adapt this change.
I hope with this and the previous blog (An experiment of Android with HANA In-Memory Database) the reader has gained good insight on HANA In-Memory Database and can utilize it in the business application, I would be writing more on HANA high performance innovations in the next blog!