This Document explains all possible tricks and conventional approaches towards gaining improved performance in query execution time. This document is intended to SAP BI professionals to familiarize with the performance optimization techniques.
1) Aggregates: An aggregate is a materialized, summarized view of the data in an Info Cube.
A) Mechanism behind Aggregation: In aggregates data will be saved in a redundant format that all the relational data will come together to constitute a logical grouping of data. Relational data huddle act as fractional cubes within the cube so as to consolidate and bring the uniformity in the data base.
B) Of Purpose and how this will improve performance: Aggregates enable us to access the cube data in a flash when reporting is done on this cube in a similar way to database indexes.
Recommended Scenarios: – 1. The execution and navigation of query is bleak.
2. To propel the execution and navigation of query
3. Frequent use of attributes in queries.
4. To hasten the reporting.
Cons: Huge maintenance efforts required when we have large number of aggregates in info cube.
2) BIA Server: – It’s a redundant storage of cube data at BIA server. It is a case of indexing activity on the database available at info cube in compressed form not in aggregate form.
A) Mechanism behind BI Accelerator: The BI Accelerator indexed data will be column levelly hived away on BIA server main memory with multiple logical split indexes on tables and remains available until it is substituted or removed to accommodate other cube data. The column level storage facilitates vertical segmentation of tables in more efficient way rather than saving row based data in conventional data base system. This prevents from reading all table data as it was the case with the conventional database which is due to the inaccessibility of predefined aggregates. BIA will read those columns that are relevant and put those entries in the forefront. So this will reduce the input and output load and main memory consumption.
B) Of Purpose and how this will improve performance: BIA is practicable when we can’t do relational aggregation or any other BI specific methods of improving performance such as data base indexing are not sufficient or too complex to carry out .As BIA enables to give the quick access to any data in the cube with low administration effort and is useful for unpredictable query types, huge volume of data and high frequency of queries.
Prerequisites: Must establish RFC connection to communicate between BI and BIA.
3) Cache mode: Cache is reserved memory for a specialized buffer storage that is continually updated and optimized to transfer data between system elements with different characteristics. Enabling cache mode defines whether and how the query results and navigational states calculated by OLAP processor should be saved as highly compressed data in cache.
A) Mechanism behind Cache mode:- Caching mechanism is similar to that of maintaining browsing history there by it store the content for specific time period. So each time query executes instead of going to data base level query picks the content from cache memory to return the results in a flash.
B) Of Purpose and how this will improve performance: – Caching stores in memory for a preconfigured amount of time content that has been retrieved by a query request. Successive identical requests first access the cache, rather than resubmitting the query to the data source, and if the cache has not expired, the request displays the information stored there. Caching improves response time and overall system performance by reducing the load on the information source.
4) Do complex Calculations at backend: Avoid doing calculations at report level. This will hinder the query response time. Create base key figures for calculations at info provider level so that all complex calculations take place before query execution
A) Mechanism behind backend process: – All the calculations and possible processes will be accomplished before query execution. Backend process maintain its own validation of key figures and does not give additional execution process when the query executed.
B) Of Purpose and how this will improve performance: Warding off complex calculation at report level doesn’t give additional processes to take up when query executed. This will minimize the query execution processes there by query response time increases.
5) BW Stats: BW statistics enable us to evaluate the fundamental functional areas of the Business Information Warehouse and OLAP processor.
A) Mechanism behind Cache mode: – Maintains the statistics of the all modeling objects.
B) Of Purpose and how this will improve performance: – Gives over view of the info cubes, info objects, info sources, source systems, queries and aggregates. Can be determined the system performance.
6) Cube partition: Partition is splitting the total dataset into several smaller physically independent and redundant free units.
A) Mechanism behind Partitioning: – Data will be divided into parts based on time characteristics available in the cube thus we can handle mass data more efficiently. Partitions do multiple, smaller, independent and related segments. This separation improves system performance when performing data analysis Info Provider, this is because the system knows only to read a particular part of the disk to retrieve the data you are after. In addition it can perform parallel data reads of multiple partitions speeding up the queries execution time. The data in the partitions will form identical uniformity acting as its own set of dataset. We can partition the data using one of the two portioning criteria 0CALMONTH and 0FISCPER which of any one of the info object should present in the cube. When you activate the Info Provider, the system creates the table on the database with one of the number of partitions corresponding to the value range. You can set the value range yourself. Choose the partitioning criterion 0CALMONTH and determine the value range from 01.2006To 12.2012; 6 years x 12 months + 2 = 74 partitions are created (2 partitions for values that lay outside of the range, meaning < 01.2006 or >12.2012).You can also determine the maximum number of partitions created on the database for this table. Choose the partitioning criterion 0CALMONTH and determine the value range From 01.2006To 12.2012Choose 30 as the maximum number of partitions. Resulting from the value range: 6 years x 12 calendar months + 2 marginal partitions (up to 01.2006, from 12.2012) = 74 single values. The system groups three months together at a time in a partition (meaning that a partition corresponds to exactly one quarter); in this way, 6 years x 4 Partitions/year + 2 marginal partitions = 26 partitions created on the database. The performance gain is only achieved for the partitioned Info Provider if the time characteristics of the Info Provider are consistent. This means that with a partition using 0CALMONTH, all values of the 0CAL x characteristics of a data record have to match.
B) Of Purpose and how this will improve performance: Having sub data sets defined by partition in the data base system we can recommend the query to search the relevant data set thus query will access that relevant dataset in a flash so as to return the desired results in good time. This will significantly improve query process in retrieving the query command. In conventional database systems data will be stored in single large dataset without predefined aggregates thus making the query to access all the database tables leading to delays in response time as it was not the case when we do partition on database tables.
7) Cube compression: Deleting the persistent request ids in info cube. When you load data into the Info Cube, entire requests can be inserted at the same time. Each of these requests has its own request ID, which is included in the fact table in the packet dimension. This makes it possible to pay particular attention to individual requests. One advantage of the request ID concept is that you can subsequently delete complete requests from the Info Cube. However, the request ID concept can also cause the same data record (all characteristics agree, with the exception of the request ID) to appear more than once in the fact table. This unnecessarily increases the volume of data, and reduces performance in Reporting, as the system has to aggregate using the request ID every time you execute a query.
A) Mechanism behind Partitioning: – when we do cube compression multiple requests ID’s gets deleted thus brings the different request id data into one single request. It’s nothing but avoiding large number of requests in info cube. Even the storage memory will be less.
B) Of Purpose and how this will improve performance:- save space on the memory, summarize a request as soon as you have established that it is correct, and is no longer to be removed from the Info Cube. Containing single request query search time minimizes and improves the reply time for query.
8) Read mode of the BEx query: Read mode determines how the OLAP processer gets the data during the navigation across the query.
A) Mechanism behind Read mode: – To determine how the query has to read the data for each navigation status on the query we have to enable this read mode to minimize access time. When this read mode is set query reads the data in a predefined way that is fetched by OLAP functions. OLAP provides the overall functionality for retrieving, processing and formatting of the data. Upon read mode set we can advice the query how to read the data when navigating across the query and how OLAP gets the data.
We have three different read modes available for this functionality.
1) Query to be read when you navigate or expand hierarchies (H): The amount of data transferred from the database to the OLAP processor is the smallest in this mode. However, it has the highest number of read processes. In the following mode Query to read data during navigation, the data for the fully expanded hierarchy is requested for a hierarchy drilldown. In the Query to be read when you navigate or expand hierarchies’ mode, the data across the hierarchy is aggregated and transferred to the OLAP processor on the hierarchy level that is the lowest in the start list. When expanding a hierarchy node, the children of this node are then read. You can improve the performance of queries with large presentation hierarchies by creating aggregates on a middle hierarchy level that is greater or the same as the hierarchy start level.
2) Query to read data during navigation (X): The OLAP processor only requests data that is needed for each navigational status of the query in the Business Explorer. The data that is needed is read for each step in the navigation. In contrast to the Query to be read when you navigate or expand hierarchies’ mode, presentation hierarchies are always imported completely on a leaf level here. The OLAP processor can read data from the main memory when the nodes are expanded. When accessing the database, the best aggregate table is used and, if possible, data is aggregated in the database
3) Query to read all data at once (A): There is only one read process in this mode. When you execute the query in the Business Explorer, all data in the main memory area of the OLAP processor that is needed for all possible navigational steps of this query is read. During navigation, all new navigational states are aggregated and calculated from the data from the main memory. The read mode Query to be read when you navigate or expand hierarchies significantly improves performance in almost all cases compared to the other two modes. The reason for this is that only the data the user wants to see is requested in this mode.
B) Of Purpose and how this will improve performance:- When the query read mode differentiates the way the query has read the data it will significantly speeds up the query reply to the user needs.
9) BEx Tuning:
a) Avoid exclude selection in the BEx query For better performance, try to avoid EXCLUSION option if possible, as characteristics in the INCLUSION can use database Indexes, Characteristics in the EXCLUSION cannot use Indexes. Therefore, it takes more time to read data for EXCLUSION as compared to INCLUSION.
b) Leverage calculated key figures rather than formulas. As compared to formulas that are evaluated during query execution, calculated key figures are pre-calculated and their definitions are stored in the metadata repository for reuse in queries. The incorporation of business metrics and key performance indicators as calculated key figures, such as gross profit and return on investment (which are frequently used, widely understood, and rarely changed), improve query performance and ensure that calculated key figures are reported consistently by different users. Note that this approach improves query runtime performance but slows Info Cube or ODS object update time. As a rule of thumb, if multiple and frequently used queries use the same formula to compute calculated fields, use calculated key figures instead of formulas.
c) Minimize the number of restricted key figures (RKFs) used in the query. RKFs result in additional database processing and complexity in retrieving the query result and therefore should be avoided when possible.
d) Use characteristics as free characteristics rather than in rows and columns. Minimize the use of characteristics in rows and columns to avoid increasing the amount of details in the initial query result. The more details in the initial query result, the larger the result set will be and the more unlikely that an aggregate will be found. Rather, use the characteristics required for navigation as free characteristics that can be used for drill-down. This strategy will accomplish the same goals, since different users typically have different preferences in terms of the order and contents of the drill-downs.
e) If hierarchies are used, minimize the number of nodes to include in the query results. Including all nodes in the query results (even the ones that are not needed or blank) slows down the query processing. The “not assigned” nodes in the hierarchy should be filtered out, and you should use a variable to reduce the number of hierarchy nodes selected.
f) Leverage filters as much as possible. Using filters contributes to reducing the number of database reads and the size of the result set, thereby significantly improving query run times. Filters are especially valuable when associated with “big dimensions” where there is a large number of characteristics such as customers and document numbers.
g) Minimize conditions-and-exceptions reporting. Conditions and exceptions are usually processed by the SAP application server that generates additional data transfer between database and application servers. If conditions and exceptions have to be used, the amount of data to be processed should be minimized with filters.
h) Avoid using cell definitions as far as possible as cell calculation by means of cell editor generates separate queries at query runtime. Hence, it should be used cautiously per the business requirement.
i) Graphics like – Charts, Graphs etc. can impact query performance.
j) Provide education and training to your users and share some of these tips with them—for example, avoid large reports. You should also strive to manage your users’ expectations. Explain to them what level of performance they can reasonably expect and what can and cannot be done to improve performance without major redesign and investment. They will then be better positioned to avoid creating unnecessary query loads and help you identify when a real performance problem occurs.
k) When a query is run on the Multi provider, all the underlying Info providers are read. This can be restricted by using Characteristic – 0INFOPROVIDER to only read the required Info providers.
10) RSRT Specific Tuning and Configuration: Set Specific RSRT Property .Check the “Use Selection of Structure Elements” Query Property in RSRT as seen here:
‘Use Selection of Structure Elements’ option influences system performance. This function should therefore normally be activated. It does not have any effect on the data displayed. Technically speaking, the system only passes to the database the selections and key figures of the columns (or structure elements, to be more precise) currently used to filter the query. If you deactivate this function, the data is read for the entire structure or for both structures from the database.
11) Variables/Parameters/Prompts: These BEx objects will give the ability to improve the performance through customized setting.
1) To maintain uniformity for users, use existing variables across all InfoCubes. Name Changes made to variables will erode uniformity.
2) Group mandatory/required variables at the top of the variable screen.
3) Group similar variables together.
4) A time characteristic should be included in every report, if available, and should generally be a mandatory/required variable. Examples include: Academic Year, Academic Session, Fiscal Year, and Validity Date.
12) Avoid Formulae at report level: It is always advised not to create formulas at report level because the evaluation process of these formulas only takes place when the query executed. This will give additional calculation process to be carried out when the report executed and hinders the reply time for report. Instead create RKF’s and CKF’s for these formulas at global level if possible as these RKF’s and CKF’s are pre calculated key figures and reusable.
13) Characteristics: A time period characteristic should be included in every report if available. Examples Include: Academic Year, Academic Session, and Fiscal Year/Period.
Minimize the use of characteristics in rows and columns for the initial query view. Include them as free characteristics that can be included when required. This will bring back less Data because it is summarized, and can be aggregated in the future for additional Performance gains.
Properties of each characteristic used in a query should be as follows:
Display as “Key and text” – where available.
Display of Results / Suppress Results Rows “With Only One Value”
General Considerations and checklist
Ø If you have a choice of building a query from an InfoCube or MultiProvider always choose the InfoCube. Building a query from a MultiProvider will require additional table joins in order to satisfy the query.
Ø Minimize the amount of information returned by the query. The less information returned from the query, generally the faster it can be transferred to the end user.
Ø Avoid complex queries that offer everything for every user .Instead create multiple queries or use the report to report interface to link the reports together at different detail level.
Ø Limit the number of free characteristics available in the report in order to ensure large amount of data cannot be requested
Ø Limit the number of key figures returned from the database for queries.
Ø Only display necessary key figures in your query output.
Ø Use characteristics or navigational attributes rather than hierarchies especially when the hierarchies are with larger nodes.
Ø Minimize conditions and exceptions as much as possible. If they are not required define them as inactive and allow the report user to enable them when required.
Ø Use filters as much as possible, especially for large characteristics, to reduce the size of the query result
Ø Instead of putting a restriction on a RKF, try to use a variable, this will cut down the query processing time especially when the user utilizes the personalization feature.
Ø If exclusions exist, make sure they exist in the global filter area. Try to remove exclusions by subtracting out inclusions.
Ø Within structures, make sure the filter order exists with the highest level filter first
Ø Move Time restrictions to a global filter whenever possible
Ø If hierarchies are used, minimize the number of nodes to include in the query results. Including all nodes in the query results (even the ones that are not needed or blank) slows down the query processing. The “not assigned” nodes in the hierarchy should be filtered out, and you should use a variable to reduce the number of hierarchy nodes selected.
Ø If “Display as hierarchy” is being used, look for other options to remove it to increase performance
Ø Turn off warning messages on queries
Ø Avoid Cell Editor use if at all possible.
Ø Within the free characteristics, filter on the least granular objects first and make sure those come first in the order
Ø Move Time restrictions to a global filter whenever possible.
Ø Do review of order of restrictions in formulas. Do as many restrictions as you can before calculations. Try to avoid calculations before restrictions.
Ø Check code for all exit variables used in a report
Ø Use Constant Selection to ignore filters in order to move more filters to the global filter area
Ø Turn off warning messages on queries
Ø Avoid Cell Editor Use if at all possible.