Enterprise Resource Planning Blogs by Members
Gain new perspectives and knowledge about enterprise resource planning in blog posts from community members. Share your own comments and ERP insights today!
cancel
Showing results for 
Search instead for 
Did you mean: 

Introduction:

I hope its not a surprise for any one reading this Article that SAP S/4HANA uses HANA DB and HANA DB uses Columnar Data Storage for most of its critical table. There is boat load of Articles around why Columnar Data Storage is good for the world and its blazing speed. My initial thoughts when I learnt how Columnar Daomta Storage works is, it is "less intuitive" than Row Data Storage. Row Data Storage (the old way that we had in ECC) was always nice and simple like "filter MKPF for the Posting date, collect all the Material docs, pass on the same to MSEG and voila you have all the item level details for a given posting date". You can play around these table joins more intuitively to create any report. Why would some one try to mess up with this familiarity?

There is a real good reason. Columnar Data Storage is REAL REAL fast, its not 1x or 2x fast. In my observation it is 100x or 1000x faster in some of the queries that we regularly use for the Business. For instance for one of the query which we used run on the weekends for a procurement report which used to take 6 hours or so in ECC, when we moved to S/4HANA (and little tweaking of the query) it took us only few seconds. It was too good to be true. Upon validation, we were surprised that it is "too good and true".   

That led to my curiosity on how the Columnar Data Storage  works. Being a tech dummy I had to research for sometime to get the details, I have tried to explain the same in a "Functional Consultant friendly" manner. Please note I could have over simplified the data and examples to make it more understandable.  

The Classic Row Data Storage, how it looks like in DB:

Lets consider the following data of Material Document (simplified the no. of columns). The first column here is the Row No.

Fig 1:

desigan_duraisamy_1-1708015793629.png

When this data gets stored in our classic Row Data Storage, it would be something like the below:

Fig 2:

desigan_duraisamy_1-1708016381881.png

In a Row stores, its plain and simple, the data is stored one row after another. For simplicity I have made 1 row of Material document details in 1 "Block" in Database, but in reality 1 DB Block would be much bigger (few KBs or even MBs) and would be storing several rows of Material documents continuously.
Lets not bother too much on what a Block is. Its a Database concept, you can consider this as chunk of Data that gets pulled for every IO (Input/Output) operation. So when my program requires data on say Material doc no. 5000000001, it pull all the data in Block 1. If my program requires data on Material doc no. 5000000003, then it still starts from Block 1, since its unable to find, it will discard the Block 1 and move to Block 2 to find, and continues this search till it finds the Block n where the data is.

Lets take a functional requirement to sum up all the "Amount" values of consignment items, this means wherever there is a Special stock K, accumulate the Amount value.

 Lets see how the Row store handles this and how many “Blocks” it has to read before coming up with a value. As you can see below with the Row store the system reads all the Blocks from the DB before it calculates the final "Amount" value, like shown in the below fig. 

Fig 3: 

desigan_duraisamy_0-1708022647705.png

 

To explain it a bit, it goes to the first row, goes to the field Special Stock type and checks for “K”, if not, goes to the next row and repeats the cycle. When it finds a “K”, it sums up the “Amount”.

Although this method is simple (at least to explain), this scans the entire table, which could be in Billions. For one of the Retail clients that we were working, the projected Material Documents growth is 9.46 Billion/year, so you can imagine, how such a query would work in real time.

This is how Column Data Storage stores the data:

Fig 1 again:

desigan_duraisamy_1-1708015793629.png

With the same raw data example that we started with, the column store would look like the below:

Fig 4:

desigan_duraisamy_4-1708017313725.png

The above fig is more for your visualization that the Blocks in the DB are filled column wise, a more realistic fig would be the below, which is just a transpose of the Fig4. 

Fig 5: 

desigan_duraisamy_0-1708025018440.png

Lets take it a bit slow, as I mentioned column store is not very intuitive. Few note worthy points:

  • Columnar Data Storage fills the DB bocks column wise. When a new Material Document say 5000000007 comes next, the Doc no. would go to Block 1, the Posting Date would go to Block 2...Plant would go to Block 6 etc.
  • The row nos. will still be tagged to each of the field value (represented the row nos in Red in the above table and in other tables as well). Note this is a very important concept, this tagging of row nos. to the field value connects them together although they are in different block. 
  • Reiterating that the above is a simplified example for easier understanding, in real world we could have millions of Material doc Nos. and the contiguous blocks containing Material doc Nos. could be in 100s. 

Compression in Columnar Data Storage:

This is where it gets interesting. As you can see in each of the Blocks in Fig 5, in each of the Blocks there are many duplicate field values. For instance Material document 5000000001 is repeated twice, 5000000002 is repeated thrice in Block 1, Plants are repeated even more in Block 6. This is where the compression kicks in and de-duplicates these field values.

Fig 6:

desigan_duraisamy_1-1708025096386.png

 

Its very important to note here, the Row tagging are still on. For instance 5000000001 is still tagged to Row no. 1 and 2. 

Its worth reminding ourselves, in SAP most of the transaction data field values are actually either a Master data (like Material no. in above example) or Configuration data (like Plant, S.Loc, UoM, Stock type etc in the above example). So the scope of of compression is very high since these Master data and Config data are very few in nos when compared to transaction data. This is the reason the "Data Foot Print" is less in S/4HANA, when compared to ECC. So when you migrate from ECC to S/4HANA you would require much lesser disk space. 

Final piece - How the queries are blazing fast: 

It took us sometime to understand how the Columnar Data Storage works, but I hope its worth it. Next comes Querying. Faster Querying is the other big advantage, rather the USP of Columnar Data Storage . Lets see with an example. Lets get back to our functional requirement of "sum up all the Amount values of consignment items" and see how the Columnar Data Storage method fares here.  

Fig 7: 

desigan_duraisamy_0-1708020274236.png

Here the system directly goes to Block 8 where the special stocks there and checks for “K” and the Row nos tagged to K. It then goes to Block 12 and sums up all the “Amount” values tagged with those Row nos. Compare this with Fig 7 with Fig 3 (Row Based) where there were multiple read of Blocks from the DB which is very expensive from a performance stand point. Finally it made sense to me how a 6 hour query in SAP ECC could get executed in less than 10 sec in S/4HANA. 

The not so good about Columnar Data Storage:

1. As you can see Write/new record inserts are not as good. While in a Row Data Storage a new record insert (like a new Material document creation) is much simpler and has to go and get updated in the next available Block in the DB, in Columnar Data Storage various field values have to get distributed across multiple Blocks. SAP HANA manages this through techniques like Delta Merge (refer: https://help.sap.com/docs/SAP_HANA_PLATFORM/6b94445c94ae495c83a19646e7c3fd56/bd9ac728bb57101482b2ebf...

2. Thoughtful Querying required. Lets say when we write a statement like Select * (means all field values) data When Stock type = K, this will try to pull all the field values, and remember these field values are spread across multiple blocks so there would be too many DB reads which will slow down the system again. We have seen real life cases where some bad queries would make S/4HANA much slower than ECC.      

Conclusion:

I hope this Blog provided you with a glimpse of what's under the hood for Columnar Data Storage. As most Organizations are moving from ECC to S/4HANA, I feel it is all the more crucial now to understand these intricacies to implement and run the business. 

Please share, Like and Comment anything else you want to share or add points.

 

8 Comments
Labels in this area