Part I : Optimize export/import using Distribution Monitor
- Introduction to Distribution Monitor
- Basis of optimization
- Introduction to Distribution Monitor
Distribution Monitor is one of the most preferred tools to perform export/import for Unicode conversion or OSDB migration for large database with very limited downtime window. You should consider using Distribution Monitor, when you do not have options like “zero downtime migration”, service offered by SAP or proprietary storage migration tools from storage vendor like EMC. The biggest advantage of using Distribution Monitor is parallel export/import which will reduce your total export/import time drastically and transfer the load from source/target to external node, so that export/import run can use the processing power of external systems (called node or Application Server). However configuration of Distribution Monitor takes a lot of time and effort. Distribution Monitor itself is not an executable or binary, it is consisting of set of scripts and parameter files; which is internally based on SAP tools – R3load, R3szchk, R3ldctl and R3ta. You can download Distribution Monitor from SMP. In the following section I will explain how Distribution Monitor works:
While configuring Distribution Monitor, you will configure more than one “Nodes”, also called “Application Server”. In this example – I am showing four nodes. Each node has been configured with some local storage or SAN storage attached to the node, both export and import process has read/write access to this storage location. Typically, export process in each node will dump (write) data from source system into the local storage of the node and import process will read data and write back to Target system database. Parallel export/import processes will run simultaneously across all four nodes and each node will handle different set of “packages” or tables.
2. Basis of optimization
In the next section, I will explain the basic technique to optimize export/import using Distribution Monitor; the first question is what to distribute? In a simple language, you can say – distribute load among various nodes.
Starting point of load distribution is the list of biggest database tables and SAP packages. So find out the list of first biggest ( say hundred) tables from the database. For very large (SAP) database, it is a general trend that first 50 -100 biggest tables represent 60 – 80 % size of the total database. In the following example. I found 70 biggest tables (with total size of 2.5 TB ) represent 84% of the total database ( 3.0 TB) size.
Next question will be, how to distribute these tables and SAP standard packages. You need to gather all facts and figures. I have outlined my finding as follows:
a) There are four nodes available, so my target is to distribute approximately 750 GB load per node (as my total DB size is 3.0 TB = 4 x 750 GB).
b) The biggest 18 tables representing the half of the database size will influence the entire export/import.
c) Split all tables more than 20 GB size. In this example, there are 18 transaction tables and 3 cluster tables – bigger than 20 GB size. So each split chunk will be max 20 GB of size.
d) There are 52 tables bigger than 2 GB and less than 20 GB size and their total size is 750 GB.
e) List down all SAP standard packages and their sizes (excluding, 70 biggest tables from step b and d).
Summary of the distribution:
Based on all facts and figures, I have the following distribution :
i. Distribute the biggest 18 tables between first two nodes ( Node1 and Node2) – split all tables with max chunk size of 20 GB, use incremental index creation option for each chunk.
ii. All tables smaller than 20 GB and bigger than 2 GB, put into “Node-3” . These are non-split tables and use “loadprocedure fast” option during import. This is a faster way of importing data then usual insert statement.
iii. Do not mix up spilt tables and non-split tables, as most of the database does not support incremental index creation for non-split tables, moreover direct load option using “loadprocedure fast” is different for split and non-split tables for most of the databases.
iv. Direct load using “loadprocedure fast LOAD” has some overheads. It needs to perform some pre and post work. If you use direct load option for all standard SAP packages, consisting of all other tables (except 18 split tables + 52 direct load tables), it will result in large amount of overheads. To avoid this situation, do not use direct load option “loadprocedure fast LOAD” for standard packages. So put all SAP standard packages into a single node ( Node 4 ) and import will perform through insert operation.
v. Along with the SAP standard package on “Node -4”, put all split cluster tables as most of the database does not support direct load import for cluster tables.
I have given you a particular example with some basic guide lines. Every migration is different, so gather all facts and figures to set your distribution accordingly. I will publish the “Part -II : export/import configuration using Distribution Monitor”.