Often we skip or ignore some of the minimal things which may make your jobs to be executed in a faster way. For the very reason, I had consolidated some key points by which we can make the BODS jobs in more efficient with optimal consumtion of resources. This discussion might me more helpful and efficient to the beginers in this area.
1. Increase monitor sample rate. ex..to 50K in prod environment.
2. Exclude virus scan on data integrator job logs.
3. While executing the job for first time or when changes occur with re-run. Select the option COLLECT STATISTICS FOR OPTIMIZATION (this is not selected by default).
4. While executing the job second time onwards. Use collected stats.(this is selected by default)
5. Degree of parallelism (DOP) option for your data flow to a value greater than one, the thread count per transform will increase. For example, a DOP of 5 allows five concurrent threads for a Query transform. To run objects within data flows in parallel, use the following Data Integrator features:
• Table partitioning
• File multithreading
• Degree of parallelism for data flows
6. Use the Run as a separate process option to split a data flow or use the Data Transfer transform to create two sub data flows to execute sequentially. Since each sub data flow is executed by a different Data Integrator al_engine process, the number of threads needed for each will be 50% less
7. If you are using the Degree of parallelism option in your data flow, reduce the number for this option in the data flow Properties window.
8. Design your data flow to run memory-consuming operations in separate sub data flows that each use a smaller amount of memory, and distribute the sub data flows over different Job Servers to access memory on multiple machines.
9. Design your data flow to push down memory-consuming operations to the database.
10. Push-down memory-intensive operations to the database server so that less memory is used on the Job Server computer.
11. Use the power of the database server to execute SELECT operations (such as joins, Group By, and common functions such as decode and string functions). Often the database is optimized for these operations
12. You can also do a full push down from the source to the target, which means Data Integrator sends SQL INSERT INTO… SELECT statements to the target database.
13. Minimize the amount of data sent over the network. Fewer rows can be retrieved when the SQL statements include filters or aggregations.
14. Using the following Data Integrator features to improve throughput:
a) Using caches for faster access to data
b) Bulk loading to the target.
15. Always views the SQL that Data Integrator generates and adjust your design to maximize the SQL that is pushed down to improve performance.
16. Data Integrator does a full push-down operation to the source and target databases when the following conditions are met:
• All of the operations between the source table and target table can be pushed down.
• The source and target tables are from the same data store or they are in data stores that have a database link defined between them.
A full push-down operation is when all Data Integrator transform operations can be pushed own to the databases and the data streams directly from the source database to the target database. Data Integrator sends SQL INSERT INTO… SELECT statements to the target database
Where the SELECT retrieves data from the source.
17. Auto correct loading ensures that the same row is not duplicated in a target table, which is useful for data recovery operations. However, an auto correct load prevents a full push-down operation from the source to the target when the source and target are in different data stores.
18. For large loads where auto-correct is required, you can put a Data Transfer transform before the target to enable a full push down from the source to the target. Data Integrator generates an SQL MERGE INTO target statement that implements the Ignore columns with value and Ignore columns with null options if they are selected on the target.
19. The lookup and lookup_ext functions have cache options. Caching lookup sources improves performance because Data Integrator avoids the expensive task of creating a database query or full file scan on each row.
20. You can control the maximum number of parallel Data Integrator engine processes using the Job Server options (Tools > Options> Job Server > Environment). Note that if you have more than eight CPUs on your Job Server computer, you can increase Maximum number of engine processes to improve performance.