With Data Warehouses around the world growing rapidly every day, the ability of a Data Warehousing solution to handle mass-data, thus allowing for the ever-shrinking time-windows for data loads is fundamental to most systems.
BW 7.3 recognizes the “need of the hour” with several performance related features and in this blog, I will discuss the performance features related to data loads in SAP BW 7.3, focusing mainly on Master Data Loads and DTP Processing.
Here is the list of features discussed addressed in this blog –
- Mass Lookups during Master Data Loads
- The “Insert-Only” flag for Master Data Loads.
- The new Master Data Deletion
- SID Handling
- Use of Navigational Attributes as source fields in Transformations.
- Repackaging small packages into optimal sizes.
1. Mass Lookups during Master Data Load
Data loads into a Master Data bearing Characteristic require database look-ups to find out if records exist on the database with the same key as the ones being loaded. In releases prior to SAP BW 7.3, this operation was performed record-wise, i.e. for every record in the data-package, a SELECT was executed on the database table(s). Obviously, this resulted in a lot of communication overhead between the SAP Application Server and the Database Server, thereby slowing the Master Data loads down. The effect is pronounced on data loads involving large data volumes.
The issue of overhead between the SAP Application Server and the Database Server has now been addressed by performing a mass-lookup on the database so that all records in the data-package are looked-up in one attempt. Depending on the DB platform it can bring up-to 50% gain in load runtimes.
2. The ‘Insert-Only Flag’ for Master Data Loads
- Starting NW 7.30 SP03, this flag will be renamed to – “New Records Only”. The renaming has been done to align with a similar feature supported by activation of DSO data. (See blog http://www.sdn.sap.com/irj/scn/weblogsblog=/pub/wlg/23048 )
As mentioned above, the Master Data Load performs a look-up on the database for every data-package to ascertain which key values already exist on the database. Based on this information, the Master Data load executes UPDATEs (for records with the same key already existing in the table) or INSERTs (for records that don’t exist) on the database.
With the ‘Insert-Only’ feature for Master Data loads using DTPs, users have the opportunity to completely skip the look-up step, if it is already known that the data is being loaded for the first time. Obviously, this feature is most relevant when performing initial Master Data loads. Nevertheless, this flag can also be useful for some delta loads where it is known that the data being loaded is completely new.
Lab tests for initial Master Data loads indicate around 20% reduction in runtime with this feature.
The ‘Insert-Only’ setting for DTPs loading Master Data can be found in the DTP Maintenance screen under the ‘UPDATE’ tab as shown below.
If the ‘Insert-Only’ flag is set, and data is found to exist on the database, the DTP request will abort. To recover from this error, the user simply needs to uncheck the flag and re- execute the DTP.
3. The New Master Data Deletion
Deleting MasterData in BW has always been a performance intensive operation. The reason being that before any MasterData can be physically deleted, the entire system (Transaction Data, Master Data, and Hierarchies etc) is scanned for usages. Therefore, if a lot of MasterData is to be deleted, it takes some time to establish the data that is delete-able (i.e., has no usages) and data that is not (has usages). In addition, with the classical MasterData Deletion involving large data volumes, users sometimes ran into memory overflow dumps.
To address these issues, the Master Data Deletion was completely re-engineered. The result is the New Master Data Deletion. In addition to being much faster than the classical version, the new Master Data deletion offers interesting new features like Search-modes for the usage check, Simulation-mode etc. The screen shot below shows the user interface for the new Masterdata Deletion when accessed via the context menu of InfoObjects in the DataWarehousing Workbench.
Although the new Master Data Deletion has be available for some time now (since BW 7.00 SP 23), it was never the default version in the system. This implied that the BW System Administrators needed to switch it ON explicitly. With BW 7.30 however, the New Master Data Deletion is the default version and no further customizing is necessary to use it.
All further information about this functionality is documented in the SAP note:1370848 under https://websmp130.sap-ag.de/sap(bD1lbiZjPTAwMQ==)/bc/bsp/spn/sapnotes/index2.htm?numm=1370848
It can also be found in the standard SAP BW documentation under http://help.sap.com/saphelp_nw73/helpdata/en/4a/373cc45e291c67e10000000a42189c/frameset.htm
4. SID Handling
This feature relates to the handling of SIDs in the SAP BW system and while it is certainly relevant for Master Data loads, it is not restricted to it. The performance improvements in SID handling are relevant for all areas of SAP BW where SIDs are determined, for example – Activation of DSO Requests, InfoCube Loads, Hierarchy Loads and in some cases, even Query processing.
In BW 7.30, SIDs are determined en-masse’ meaning that database SELECTs and INSERTs that were done record-wise previously have been changed to the mass SELECTs (using the ABAP SELECT FOR ALL ENTRIES construct) and mass INSERTs. The system switches to this mass-data processing mode automatically when the number of SIDs to be determined is greater than a threshold value. The default value of this threshold is 500.
The threshold value is customizable of course and that can be done in the SAP IMG for customizing under the transaction SPRO by following the path: SAP Netweaver -> Business Warehouse -> Performance Settings -> Optimize SID-Determination for MPP-Databases.
Note: As the threshold value corresponds to the minimum number of SIDs
to be determined in one step, setting the threshold to a very high value
(For example: 100000) causes the system the system to switch back to the
5. Use of Navigational Attributes as source fields in Transformations
Quite often there are scenarios in SAP BW where data being loaded from a source to a target needs to be augmented with information that is looked up from Masterdata of Infoobjects. For instance – loading sales data from a source that contains data on Material level to a DataTarget where queries require the sales data to be aggregated by Material Group. In such cases, the Master Data Lookup rule-type in Transformations is used to determine the Material Group for any given Material (given that MaterialGroup is an attribute of Material).
Although the performance of the Masterdata Lookup rule-type has been optimized in earlier versions of BW (starting BW 7.0), there is an alternative to this rule-type in BW 7.30. Now, navigational attributes of Infoobjects are available as source fields in Transformations. The benefits of this feature are two-pronged.
- The fact that the data from the navigational attributes is available as part of the source structure allows the data to be used in custom logic in Transformations (example : Start Routines).
- Secondly, the data from the navigational attributes is read by performing database joins with the corresponding Masterdata tables during extraction. This helps in improving the performance of scenarios where a lot of look-ups are needed and/or a lot of data is to be looked-up.
To use this feature in Transformations, the navigational attributes need to be switched ON in the source InfoProvider in the InfoProvider maintenance screen as below –
Once this is done, the selected navigational attributes are available as part of the source structure of Transformations as shown below –
Data Transfer Process (DTP)
1. Repackaging small packages into optimal sizes
This feature of the DTP is used to combine several data packages in a source object into one data package for the DataTarget. This feature helps speed up request processing when the source object contains a large number of very small data packages.
This is usually the case when memory limitations in the source systems (for example: an SAP ERP system) results in very small data-packages in the PSA tables in BW. This DTP setting can be used to propagate the data to subsequent layers in BW in larger chunks.
Also, InfoProviders in BW used for operational reporting using Real-time Data Acquisition contain very small data packages. Typically, this data is propagated within the DataWarehouse into other InfoProviders for strategic reporting. Such scenarios are also a use-case for this feature where data can be propagated in larger packets.
As a prerequisite, the processing mode for the DTP needs to be set to ‘Parallel Extraction and Parallel Processing’. Also note that only source packages belonging to the same request are grouped into one target package.
Below is a screenshot of the feature in the DTP Maintenance.