Over the past couple years I have been involved in many sizing activities for BW on HANA. I have collected some valuable lessons learned. This should help you understand better what common mistakes to avoid and how the sizing works in more detail.
Note: This blog is specifically for sizing existing BW systems. For sizing brand new installations, we recommend using SAP QuickSizer (http://service.sap.com/quicksizer). Also if you are planning to run multiple applications on a single SAP HANA database (in addition to BW), you have to size these applications independently and added them together.
The following is based on an actual customer case. The customer says their BW system is 6.1 TB. Quick, takes a guess of how big the SAP HANA appliance needs to be…. Well, I hope you gave the proper “consulting answer” that you need more details. But let’s see if you would have taken all of the following into consideration.
1) Do not count log and free space
In this case we were dealing with BW running on a Microsoft SQL server database. The 6.1 TB was actually the total size of the data and log files. Ha! Beginner mistake. Clearly only data files excluding the free space are relevant for HANA sizing (=allocated space). I’m sure everyone got that right.
2) Do not count indexes
Next we look at the allocated space. Much of it is taken up by database indexes and some database-specific administration data. We are really interested in the pure database tables only. In our case study, over half of the allocated space was indexes! HANA persists data differently, so we disregard all indexes.
3) Do not count aggregates
BW on HANA does not need aggregates. All related menu items have been removed and aggregate functionality is disabled since SAP HANA can aggregate billions of records on the fly.
4) Do not calculate with size of compressed data
We are down significantly from the original 6.1 TB but here’s a surprise… The MS-SQL database was compressed already. We need the uncompressed data into consideration. However, we need to do it right. Don’t rely on any compression factors provided by the database. It’s hard to understand how these were calculated. They are irrelevant anyway since SAP HANA compresses differently.
Here’s an example, you have a table with char(100) field and 1 million records. Some might say that’s 100 million bytes of uncompressed data. But this again is not enough information to determine the size of the table in HANA. It matters what data is actually stored in the table! If a record contains “ABC”, we need 3 bytes and not 100. Ergo, what you really need to do is look at all the data in the table.
Reading the complete database would be rather time consuming so we take some shortcuts: We sample only a small subset of the records and extrapolate the results. And we can exclude reading fields that have a fixed size like INT4. In the example, we select a sample of 100,000 records and determine an average field length of 10. This gives us an uncompressed, raw data size of 10 million bytes.
Your system is not on Unicode yet? Typically, there’s not much textual information stored in BW (this will certainly change over the coming years) so you need to apply an uplift of about 5% (based on experience).
If we follow this process for all tables, we get about 3.2 TB of uncompressed data that will need to fit into our SAP HANA database. Grouped into the usual BW categories, it looks like this:
5) Do not apply the same compression factor to all tables
Sure, HANA does some incredible compression of data but it’s not the same for each type of table. For example, InfoCube fact tables contain integer values only (pointers to dimensions or master data) and we usually find only few unique values per column. These will therefore compress very well. We assume factor 7 based on experience with other customer systems.
For master data tables, the opposite is true. Mostly all values in a column are different (not so much for attributes of master data of course). Overall, we will get little compression (1-2x).
6) Do not assume the complete table must be in memory
We also have to take into consideration that many tables are partitioned by default in BW on HANA and only the used partitions will be loaded into memory. Best example are PSA tables which are partitioned by request id. BW usually works only with the most recent requests and therefore much of the PSA table will not be loaded. This is the default behavior for PSA tables and write-optimized DataStore Objects.
Optionally, we can apply the “non-active” data logic to other InfoProviders as well (if you configure BW accordingly).
The result in our case study: We end up with approximately 925 GB of memory once the data is compressed inside HANA.
7) Do not forget room for server processes, caches, and processing of data
The database software needs to run in memory of course. For SAP HANA we assume about 50 GB of memory are required for these server processes and related caches (per node).
What’s more essential is to leave amble room for dynamically memory allocation for processing your business data. There are two main cases: merging new data into a column store table (so called “delta merge”) and processing end-user queries. We don’t want to run short on memory for these.
In general we double the size of the business data to include enough room for processing. (Note: We don’t double for tables marked as “non-active”).
8) Do not assume the total amount of memory is the same for different size of server nodes
Finally, we need to distribute the data to server nodes. Today we have nodes with 512 GB or 1 TB available. First, we need to take into consideration that only certain tables will be located on the master node. This includes all tables of the row store plus some other administration tables located in the column store (another mistake is thinking that the master node is only for the row store).
All business data however is distributed across the available slave nodes. Since the master node is not used for business data and we need some amount of memory for server processes and cache per slave node, you can end up with different total amount of required hardware.
In our case study, the result was either 5x 512 GB nodes or 3x 1 TB nodes.
9) Do not forget to size themaster node properly
As mentioned above, the master node will contain certain tables as a default configuration. Therefore you have to make sure that these tables will fit into the given size of the master node… including future growth of these tables. If you follow best practices for housekeeping in BW, this is not a problem at all. However, if you are a bit lax on housekeeping, you might run into the restriction of the master node.
For a one time clean up and frequent housekeeping for row store tables, please see SAP Note 706478.
The most common areas for BW customers are:
Additional details on sizing the master node, see SAP Note 1855041.
10) Do not ignore means for system size reduction
So far we assumed that the complete system is migrated as is. However, there are things to consider that will reduce the over size of your system significantly:
- Removal of complete data layers, for example reporting off DSO instead of an InfoCube with the same or just aggregated data
- More aggressive housekeeping of PSA, change log, and DTP error stacks
- Usage of near-line storage solution (NLS)
- Revisit data retention strategy and archiving
For more details see “Optimization of Data Retention“.
Now you know how to avoid these pitfalls. Did you get them all? Of course it would be extremely cumbersome and time consuming to calculate all of this by hand. But here’s the best news: We have a program available that fully automates the complete sizing calculation!
For the one and only way to size a BW system properly for SAP HANA, please see SAP Note 2296290 and its attachments.