Row Per Commit & Array Fetch Size in BODS
This Article will provide the information on Row Per Commit and Array Fetch Size. These two BODS Performance techniques improve the job execution time in practical scenario.
Row per commit :
For the best performance, BODS recommends to set the Rows per commit value between 500 and 2000. The default value for regular loading for Rows per commit is 1000. The value of Row per commit depends on the numbers of column in the target table. While adjusting the Row per commit value in the target table following should be keep in mind, the default value is 1000 and maximum value is 5000 only and do not enter any negative and non-numeric value. Row per commit is target side performance tuning technique.
The formula for calculate the Row per commit value is: max_IO_size /row size (in bytes)
Following scenarios will show the impact of Row per commit in the BODS Job. For further details, find below the screen shots :
In first scenario: Row per commit is 200 and source data row count is 18116224 records. It take 229 seconds (3min,49secs) to load the data into target table.
In second scenario: Row per commit is 2000 and source data row count is same as above (18116224 records). It takes 136 seconds (2min,16secs) to load the data into target table.
Hence, in the second scenario job executive time gets reduce because RPC value set properly based on data load.
Note: The performance becomes worse, if we set RPC too high for our environment or if we used an overflow file or our initial transactions failed than in such conditions the Rows per commit value behave like it sets to 1.
Array fetch size :
It indicates the number of rows returned in single fetch call to a source table, default value is 1000. AFS value reduces the number of round trips the database and hence performance for the table reads improves. Array Fetch size is source side performance tuning technique.
For Example: If you enter 50 and your query retrieves 500 rows, the connection executes 10 fetches to retrieve your 500 rows data.
- Data retrieve row by row if you enter 1 for array fetch size, in such scenario array fetch size is deactivated and slow the server performance time.
- If you keep array fetch size is too high then it will cause Fetch Out of sequence error, then in such scenario receiving end packet is corrupted and has to be re-transmitted.
Following scenarios will show the impact of Array Fetch size in the BODS Job. For further details, find below the screen shots :
In first scenario: Array Fetch Size is 100 and source data row count is 18116224 records. It takes 292 seconds (4min, 52secs) to load the data into target table.
In second scenario: Row per commit is 1000 and source data row count is same as above (18116224 records). It takes 259 seconds (4min,19secs) to load the data into target.
If Array Fetch size value is set properly for large scale of source data load in complex ETL job, then performance will be improve more in terms of job execution time.
Hence, in the second scenario it shows when the Array fetch size value is 1000 (greater) than the previous one so the job executive time gets reduce.
Hope this document helps those who are new to BODS. In future will discuss more about BODS performance tuning techniques. Thanks for your time…
Again it depends on OS where your BODS is installed!!
And regarding setting up rows par commit to 1 ,It's not supported by SAP for many databases,If you set it then again BODS will fetch\commit the data in 1000 chunks which is default value.
I think it's not always necessary that you wil always gain performance while setting it up t certain value , It all depends on database,scenarios and volume of data.
The above scenarios are tested over windows7 32-bit operating system and Database Oracle 11g.
In the article it is clearly mentioned If you are dealing with the Huge data load (millions or billions set of records) then only RFC, AFS will helps you to improvise the performance in some of the cases
I agreed no doubt DB, volumes of data & Complex ETL mapping always impacts.
Very detailed review good sir.
My typical statistics are 1hour 15min for 76million records with 23 columns and array fetch size at default 1000.
After modification to array fetch size at 10,000. My stats are 1hour 25min for 76 million records... Whoops.
We have BODS 4.2 running on RHEL 8.1. I wanted to know how to find max_IO_size in RHEL.
I have one doubt, if we select AFS = 2000 and RPC = 500,
Then how it will behave, will be there any performance optimization happen or what kind of scenario will be there?
The one is for reading, the other for writing. So you will fetch data from the source in 2000 rows packages, stream the rows via an internal pipeline to the writer thread and this will load data in batches of 500 rows. No problem.