Performance Optimization in BODS
SAP Business Objects Data Services delivers a powerful, enterprise-class data integration and data quality solution. It provides one development, run-time, and management environment that allows you to integrate, transform, improve, and deliver data anywhere, at any frequency.
With SAP BusinessObjects Data Services, you can:
- Respond to information needs and accelerate time to market
- Deliver the most trusted and accurate information
- Maximize operational efficiency with a single platform
- Leverage a collaborative, complete, and open solution scalable for any project size
Features and Functions:
The comprehensive features and functions of SAP BusinessObjects Data Services can enable the following:
Agility – Leverage features that can accelerate time to market, encourage reuse and collaboration, increase developer productivity, and create unified skill sets.
Trusted information – Improve the reliability of your data with functions for data governance – including transparency, change management, security, and controls – global data support, and end-to-end metadata management.
Operational excellence – Gain a single data repository, platform, and environment so that costs associated with supporting multiple applications can be reduced; Leverage multi-platform support for 64bit, UNIX, Linux, and Windows.
Enterprise class – Benefit from highly scalable performance with support for parallelization and grid computing, enterprise connectivity, and support for service-oriented architecture (SOA).
It is recommended that the BODS jobs should be tuned accordingly to be able to complete in less time and utilize less resource.
This document deals with the different Performance Optimization techniques needed to fine tune the BODS jobs there by enabling the BODS jobs to complete in less time with less rates of failure.
2 BODS Performance Optimization
BODS is an ETL (Extraction, Transformation and Load) tool that selects data from one data base, processes it in a job server by applying the transformation rules specified in the mapping and loads the data into the specified target database.
There are 7 touch points that need to be considered for performance optimization. They are given below:
1. Source Operating System.
2. Source Data Base.
3. Target Operating System.
4. Target Database.
6. Job Server OS.
7. BODS Repository database.
The BODS Job Server and the BODS Repository Database can reside on the same server. If installed on different servers, Network throughput configured between Job Server and BODS Repository Database will play a key role. The performance tuning methods for each identified touch point are given below.
2.1.1 Source Operating System
The Source Operating System should be tuned to quickly read data from disks. This can be done by the following methods:
· Set the read-ahead protocol to 64 KB to make the I/O operations fast.
· The size of read-ahead protocol is by default set to 4-8 KB.
2.1.2 Source Database
The source database should be tuned to make SELECTs as quickly as possible. This can be done by the following methods:
· Increase the size of Database I/O to match with the OS read-ahead protocol; otherwise it may cause bottle neck there by affecting the performance.
· Increase size of shared buffer to cache more data in the database server.
· Cache tables that are small enough to fit in the shared buffer by enabling cache at table level. Caching reduces the number of I/Os thereby improving the speed of access to the data.
· Turn off PARALLEL for small tables.
· Create indexes on appropriate columns by looking into the dataflow to identify the columns used in the join criteria.
· Create Bitmap indexes on the columns with low cardinality as the index does not take much space in the memory, there by resulting in faster SELECTs.
2.1.3 Target Operating System
The Target Operating System should be tuned to quickly write data to disks. This can be done by the following methods:
· Turn on the asynchronous I/O to make the Input/output operations as fast as possible.
2.1.4 Target Database
The Target Database should be tuned to perform INSERTs and UPDATEs as quickly as possible. This can be done by the following methods:
· Turn off archive logging.
· Turn off redo logging for all tables.
· Tune rollback segments for better performance.
· Place redo log files and data files on a raw file if possible.
· Increase the size of the shared buffer.
Even if the Source Database and Target Database are tuned if the network band width is small, there is possibility for the occurrence of bottle neck which hampers the performance.
· Adjust the size of the Network buffer in the database client so that each client request completely fills a small number of network packets, there by reducing the number of round trips across databases which in turn improves the performance.
2.1.6 Job Server OS
Data Services jobs are multi-threaded applications. Typically a single data flow in a job initiates one ‘al_engine’ process that in turn initiates four threads. For maximum performance benefits:
· Consider a design that runs one ‘al_engine’ process per CPU at a time.
· Tune the Job Server OS so that theData Services threads spread to all available CPUs.
The above performance optimization methods need to be implemented during the environment and infrastructure preparation of BODS components.
2.1.7 Data Services Jobs
The following execution options can be used to improve the performance of BODS jobs:
· Monitor Sample Rate: If the job processes large amount of data set the ‘Monitor Sample Rate’ to a higher value (maximum being 50,000, default is 1000) to reduce the number of I/O calls to the log file there by improving the performance.
· If virus scanner is configured on the BODS JobServer, exclude the Data Services log from the virus scan. Otherwise the virus scan scans the Data Services log repeatedly during the execution, which causes performance degradation.
· Collect Statistics for self-tuning: BODS has a self-tuning capability to determine the cache type by looking into the statistics of previous job executions. The Collect Statistics option needs to be selected during the first execution of the job. BODS collects the statistics for that job and stores in the metadata for that job. In the next execution select the ‘Use Collected statistics’ option to allow BODS to decide the type of cache to be used to execute the job, there by improving the performance of the job.
· Set the data flow properties like Degree of Parallelism depending upon the number of CPUs available for processing and set the cache type to in-memory if the data volume is less.
· If source tables are from same schema and using same Data Store, identify the joins in the early phase so that the join can be pushed to the database.
· Create synonyms for the tables in other schemas to pushdown join conditions of tables belonging to different schemas in a database.
· Use data base links to connect different schemas in different databases to pushdown the joins of tables available in these schemas to database.
· Use data transfer (type = ‘TABLE’) to pushdown the complex logic in a dataflow to the database.
· Do not use advanced features like Run Distinct as Separate process etc available in ‘Advanced’ tab in Query transform, as it starts multiple sub-processes which causes heavy traffic between the processes and there by lead to the termination of job.
· Do not use Data Transfer transform unless required. (Use table type if required as it is more reliable.). SAP suggests that Data Transform is not a reliable transform and hence recommends to not using it unless required.
· Turn off the Cache option for the tables with larger amounts of data. Cache is turned on by default for every source table. Make sure that there are indexes created on key columns for these tables on which cache is turned off.
· Do not use BODS functions like job_name(), instead initialize a variable in a script and use that variable for mapping in query transforms.
· Use Join where ever applicable in place of look up transform as the look up transform has to access the table for each and every record which increases the number of I/O operations to database.
· Use Query transforms to split the data in place of Case transforms as it is a costly transform in BODS.