A blog from Sridhar Garige (Cognizant US)
Obvious Questions and Answers which almost all beginners in SAP HANA
We have been hearing about HANA since the beginning of this decade or even earlier. Initially I thought it was just a new database. Thought, may be SAP does not want to share the market revenue with any other database provider (competitors); therefore they came up with their own database.
Later I had a notion that HANA is only for BI/BW folks, so how about for ABAPer? Everyone used to talk about analysis and modelling. So, I think, it is a piece of cake for BI/BW modelers. Then ABAP…?
Then the rumor started in market; ABAP going to be extinct in near future. I used to wonder, if ABAP is going to die, then who in this whole universe would support those tons and tons of ABAP code written in the history of SAP Implementations? What will happen to all those time, effort and money spent in those large and small scales SAP Implementations? I have spent more time in researching what is HANA than actually learning what HANA is. Internet is full of information regarding HANA but finding the right answers for your curiosity or doubt, is an uphill task.
I spent and some time and gathered some information from other sources to know what is HANA and who needs it and why?
Hope all these information at one place would help you to understand it better.
- Q. Is SQL a pre-requisite to learn HANA?
SAP HANA is like any other relational database. Having Database Concepts and basic knowledge of SQL before starting SAP HANA is an advantage, but it is not a pre-requisite. You can always catch up with these concepts while learning SAP HANA.
- Q. Without SAP BI/BW/BO knowledge, can ABAPer learn HANA?
BI is the Data Warehousing package implementation tool from SAP. Data Warehousing Concepts in SAP BI will help understand the implementation aspects from BW on HANA perspective.
But unless you plan to a BW on HANA consultant, you necessarily do not have to learn BI. Similarly BW and BO are Business Warehouse and Business Object respectively. If you have prior BW experience, understanding modeling concept and transferring data SAP Business Suite System to HANA would be child’s play for you. But, we can easily learn HANA modeling concept even if we do not have current exposure to BW. But it would be a must for those consultants who are eyeing the role of BW on HANA expert.
By now, I have understood that BO is a front end reporting tool. Prior knowledge in reporting tools would be an advantage but, we can always learn BO concepts while learning HANA.
But, if you already have BI/BW/BO knowledge, then BW on HANA work would be the role you would be targeting to (if you are planning to shift to HANA).
Q. Is SAP ABAP skilled required to learn HANA?
If you are an SAP ABAP programmer, then implementing the business logic and model would be fun for you. You must have already heard about SAP ABAP on HANA. Let’s put a full stop to the rumor that ABAPer are vanishing. With HANA, ABAPer would be smarter and more in demand. Only ABAP on HANA consultant would need ABAP knowledge as pre-requisite.
Q. Is HANA for functional folks or technical folks or modelers?
Q. SAP HANA claims to be so fast. Which programming language is it written in?
Ans: World famous C++.
Q. HANA is about RAM, so can we increase the memory size of traditional database and get similar performance like HANA?
We would definitely get better performance if we increase the memory size of traditional database, but it would not be comparable to what we get in HANA. Because, HANA is not just about database. It is a hybrid in-memory database which is combination of niche Hardware and Software innovation as stated below:
In-Memory storage (RAM): Processing data from RAM itself is 1 million time faster than accessing data from hard disk. In practical scenarios, it might is around 10x to 3600x time faster. Also, in today’s world RAM is cheap and affordable expense wise.
Read time in RAM: 2 MB/ms/core (2 megabyte per millisecond per core).
So to scan 1 GB of data, it would approximately take 0.5 s/core. For 100 GB it would take 50 s/core. If you have 50 cores in the hardware, scanning 100 GB data would take just 1 second. Huh!! Quantitative numbers always clarifies better than paragraphs of sentences. Isn’t it?
Multi core Architecture, Partitioning & Enormous Parallel Processing: Servers are available with one node up to 64 cores (and even more). So partitioning the data foot prints in different node and running the query parallel is the innovation which HANA uses so effectively. This is perfect example of both hardware and software innovation.
Columnar Storage: Contiguous memory allocation faster reading with sequential memory access. Remember, column store not only makes reading faster. HANA has built the column store is such a way that it is efficient for both READ and WRITE.
Quick aggregation (normally aggregations are expensive) and also supports parallel processing.
Searching in column store is must faster than row storage (provided you are selecting only some sets of columns, not all).
Data Compression: Minimize data footprint through Compression i.e. less data movement means faster performance.
Idea is remove repetitive data, build a vector for the data and point it with an integer (and integer is less expensive than reading a string).
Q. Is Computer Architecture is changing?
Q. How does Column Storage actually make it faster?
Ans: Column store is divided into three parts:
- L2 Delta
- L1 Delta/cache
Persisted data are saved in Main Memory, all buffer and transaction changes are kept in L2 Delta and High Inserts / Deletes / Updates etc in L1 Delta
- Accepts all incoming data requests
- Stores records in row format (write-optimized)
- Fast insert and delete
- Fast field update
- Fast record projection
- No data compression
- Holds 10,000 to 100,000 rows per single-node
- The second stage of the record life cycle
- Stores records in column format
- Dictionary encoding for better memory usage
- Unsorted dictionary
- Requiring secondary index structures to optimally
- Support point query access patterns
- Well suited to store up to 10 million rows
- Final data format
- Stores records in column format
- Highest compression rate
- Sorted dictionary
- Positions in dictionary stored in a bit-packed manner
- The dictionary is also compressed
So the smart innovation of L1, L2 and Main memory and combination of all three, make data read and write, really fast and effective.
Q. How can we decide whether to go for Row or Column storage?
If you want to report on all the columns of a table then the row store is more suitable because reconstructing the complete row is one of the most expensive operations for a column-based table.
If you want to store in table huge amounts of data that should be aggregated and analyzed then column based storage is more suitable.
– – –
You can also check the attached .XML document.