Demystifying aggregates in SAP BI 7.0
Demystifying aggregates in SAP BI 7.0
Aggregates are subsets of InfoCube data, where the data is pre-aggregated and stored in a InfoCube structure. Aggregates on an Info Cube can be considered similar to that of an Index on a database table. Creating an Aggregate on an InfoCube is one of the few ways to improve performance of SAP BW Query. Subset of an InfoCube data is stored in an Aggregate. As a result, the response time that we get out of reading the data from aggregate will be much faster than reading from an InfoCube.
When a query gets executed it is split into several sub queries. Split of the query is based on the following rules:
- Condition 1: Parts of the query on different aggregation levels are split.
- Condition 2: Different Selections on characteristics are combined
- Condition 3: Parts on different hierarchy levels or parts using different hierarchies are split.
Aggregates that we build should meet the needs of query navigation and several sub queries within each query. If we have more than one aggregate built on top of a cube, after the query split, OLAP Processor searches for an optimal aggregate for each part. This means, a single query using a multiple sub query can access more than one aggregate to get the desired output. Refer to the flow chart shown below. At the end, OLAP processor consolidates all the results and gives the desired output.
If a query performance is poor, we can use the following 2 thumb rules to decide if Aggregates will actually help improve performance.
- Condition 1: Summarization Ratio greater than 10
Summarization Ratio is the ratio of number of records in the cube to the number of records in the aggregate.
Note: If the summarization Ration is very high, say 300 or more then the aggregate might be very specific to a query and cannot be used by more than one query.
- Condition 2: Percentage of DB time > 30% of the total query run time.
Please note that building an aggregate will only improve the database access time.
- There is a tool called Work Load Monitor (ST03N) that is used to track the System Performance. It basically helps to analyze the queries with highest run time.
- DB time can also be determined by executing the Query through RSRT. The EVENT ID 9000 (Data Manager): highlighted in the screen shot below gives the DB time.
Summarization ratio can be seen by executing the query from RSRT transaction. Select the query you want to monitor the performance and click on “execute and debug” button as shown in the screen print below
Usage: Every time a query hits the aggregate, Usage counter gets incremented by 1. If your Usage counter is zero, in 99% of the cases, you can safely assume that your aggregates are not being used.
Last Used Date: Along with the Usage Indicator, another important parameter to be kept in mind is the Last Used Date. For an Aggregate, Usage Indicator might show some big number. But when you actually take a closer look at the last used date, It might show a date 6 months prior to the current date. This means that this aggregate has not been used by any query in the last 6 months. The reason might be that there were many changes made at the query level but the aggregates were not updated or modified accordingly.
Valuation is the system’s best guess of judging how good an aggregate is. Usually, if the summarization ratio is high, the number of + signs in the valuation will be more. There could be some aggregates where the number of plus signs in an aggregate is more, but the aggregate as such is never used. Though the valuation will give a fair idea about the aggregates created, it need not be 100% right.
Once the data is loaded into the cube, following steps are to be carried out- for the data to be available for reporting at the aggregate level.
- Activate and Fill à Aggregates should be activated and filled.
- Aggregate Rollup: Is a step by which, the newly added data at the cube level gets aggregated and available for reporting.
- Change Run (Master Data Activation): To activate the changes of the master data. All the data containing the navigational attributes gets realigned.
- Adjustment of time dependent aggregates: This is done to recalculate the aggregates with time dependent navigational attributes.
- Any aggregate that is created on a cube has to be periodically monitored for its usage and the last used date. If there is any aggregate that is not being used, it has to be either modified or removed
- Un-used aggregates add to the load performance. It is always a good practice to drop the aggregates that are not being used.
- Aggregates should be relatively smaller compared to the Info cube.
- Too many similar aggregates can be consolidated to a single aggregate.
TOOL – Pre-Analysis of the aggregate filling.
With changes beyond a certain magnitude, modifying the aggregate becomes more time consuming than reconstructing it. You can change this threshold value.
Steps-In the implementation guide, choose SAP NetWeaver–> Business Intelligence –> Performance Settings –>Parameters for Aggregates in the section Percentage Change in the Delta Process. In the Limit with Delta field, enter the required percentage (a number between 0 and 99). 0 means that the aggregate is always reconstructed. Change these parameters as many times as necessary until the system response is as quick as possible. We can accord this with the help of TOOL called “Pre-analysis of aggregate filling”.
By default blocksize is set to 100.000.000. SAP recommends changing this setting to a value between 5.000.000 and 10.000.000. In this way, we reduce joining or sorting on disk and also reduce log space consumption. You should not set BLOCKSIZE to lower values, because this can result in a WHERE condition that forces the optimizer to use an index other than the clustered index. Suggest to use 5.000.000 for systems with less than 10 GB of real memory. If you have more real memory SAP recommends setting BLOCKSIZE to a value up to 10.000.000.
Ideally the index on the Time dimension should be used when reading data from the fact table, because the fact table is clustered according to the time dimension. In many cases index on data request dimension is chosen.
The tool helps you to identify how much block size you can actually generate with your existing aggregates. Use button “Pre-Analysis of the aggregate filling” in the aggregate maintenance window to see the SQL statement that is used depending on the value of parameter BLOCKSIZE.
We can explain this statement in ST05 and check, which INDEX is used to access the FACT table and then act accordingly.