SUM Upgrades Configuration : Tuning/Process Counts
Configuration Screen of SUM : Parameters for procedure.
Ok, So the Problem was on the Process count to use for our updates and upgrades. I turned to many fellows in my touch, All of them had some calculations of random andarbitrary nature. I was not convinced. We had to have something concrete for each and every process type which could be used for direct calculations. Tried multipleruns on a Sandbox, Referred to many notes, Initiated an OSS Incident, and also turned to inhouse experts of Database and SAP.
The final results provided reason for the random arbitrary nature of the view taken by my colleagues. You can’t have something conclusive like (Number of CPUs X 1.3 = R3trans processes to use), although a lot of industry veterans do so. What one can do is fall into the ‘Thought process’ of researching, tuning, observing, andtesting.
One of things that I found myself in great need of but missing was a good SCN blog on the topic. There were tidbits here and there, but hardly any good guidance.
The reason I initiate this blog and discussion is just that : To get thoughts from any and all, so the end page is an ever evolving starting point for everyone at the above screen of SUM for their respective SP/EHP/Release Upgrade.
Lets discuss process by process the thought process I used:
1. ABAP Processes :
Pretty Straightforward. Configure according to BGD processes available in the main system. Make sure enough is left for the Normal jobs users have to run. For downtime, you can use the maximum available. As per the SUM Guide, the returns stagnate after a value of 8. So below is what I used for system with 10 BGD available:
UPTIME : 6
DOWNTIME : 10
Could have increased the BGD in the system, but since the value above 8 should not have had much impact, above counts seemed optimal to me.
2. SQL Processes :
This part looks simple, but was the trickiest for me. Appropriately sizing this can do good for DBCLONE and PARCONV_UPG Phases. But size too large and you may experience frequent deadlocks in various phases, logging space full errors, archiver stuck, or performance severely impacted.
The Problem in my case, when using nZDM with very high SQL count was – “Transaction Log is full” – DB2 Database running out of the logging space. If you are working with a database like DB2 – where you have “Active logging space” constrained by DB parameters, make sure to size this process count small – Too many parallel SQL statements and logging space will fill up quick resulting in the aforementioned error which can only be bypassed by decreasing the count. To unthrottle, increase logging space or Primary/Secondary logs. Also, the log archiving has to be fast with plenty of buffer space in the archive directory.
As for the count, if one can take care of logging space and log archives, the next step is CPU. Different databases may slightly differ when dealing with execution of SQL in parallel. But core concept remains the same. More CPUs Help. Once you have a number, like 8 cores in my example, You next need to finalize the degree of parallelism (DOP – Oracle Term) – The number of parallel threads each CPU will be executing. For example, if 16 SQL Processes would have been used in my case – 2 threads would be executing per CPU – A choice I didn’t took as I wanted minimal impact on the productive operation of the system during the uptime phases.
Referring to the standard documentation of Oracle and DB2 databases – what I noticed was that the default and recommended DOP is 1-2 times the number of online CPUs. Also, the return is stagnated after increase to a particular number, after which the negative effects (Performance deterioration) increase as usual but returns are minimal.
After increasing the logging space, taking enough archiving space directory, following is the number I used for 8 CPUs.
UPTIME : 8 (DOP=1)
DOWNTIME : 12 (DOP = 1.5) Will make this 16 in the next system.
DBCLONE done in couple hours with above – Good for me.
4. R3trans Processes :
So the big one now. This process count has the biggest impact. TABIM_UPG, SHADOW_IMPORTS, DDIC_UPG – Phases with biggest contribution to runtime/downtime – go faster or slower based on how much this is tuned. The below KBA is the first step to understand how tp forks these during imports. There is a parameter “Mainimp_Proc” which is used in the backend to control the number of packages imported in parallel and the below KBA explains just that – The entire concept.
1616401 – Understanding parallelism during the Upgrades, EhPs and Support Packages implementations
1945399 – performance analysis for SHADOW_IMPORT_INC and TABIM_UPG phase
Now, how to tune it. This was one the most confusing ones. There are notes which say to keep it equal to number of CPUs (Refer Above notes – They say this). The SUM Guide seems to love the value of 8 (The Value larger than 8 does not usually decrease the runtime <sic>). You also have to keep in note the memory. A 512 MB of RAM per R3trans Process seems a good guideline. The end result for me was the same process count as SQL Processes :
UPTIME : 8
DOWNTIME : 12
One other thing still left unexplored, but next on my radar, is playing with “Mainimp_Proc”. The below link talks about changing that using parameter file
TABIMUPG.TPP. Since this controls the number of TP Processes, tuning this should be done after results from one system. Readings there in the logs can help here.
5. R3Load processes :
For EHP Update/SPS Update, I dont think this plays any part. From what I understood, this is relevant majorly to the Release Upgrade. Anyways, this one was a bummer. I didn’t seem to find any helpful documentation on R3load relevant for the upgrades specifically . However, Communicating with SAP over an OSS. The below guideline was received and used :
“There is no direct way to determine the optimal number ofprocesses. A rule of thumb though is to use 3 times the number of available CPUs.” The Count I used:
UPTIME : 12
DOWNTIME : 24
But anyone from the Community can answer and increase my understanding : Which phases use this in upgrades, if any?
6. Parallel Phases :
Another one of random nature with Scarce details. This one talks about the number of SUM sub-phases which SAPUp canbe allowed to execute in Parallel. Again, had to refer to SAP via OSS Incident for the same.
“The phases that can run in parallel will be dependent on upgrade/update that you will be performing and there is no set way tocalculate what the optimum number would be.” Recommendation was to use default and that is what I did.
UPTIME : Leave default (Default for “Standard” mode – 3, Default for “Advanced” mode – 6)
DOWNTIME : Leave default (Default for “Standard” mode – 3, Default for “Advanced” mode – 6)
The blog looks good but it talks more about configuring the update/upgrade than about tuning means reducing the Downtime making the downtime shorter and reducing the total runtime of the upgrade.
For example the tuning could cover the import phases in downtime like PARCONV_UPG how do you ensure it doesn't take more 10-15 mins in some case it can run for more than 8-9 hours and other phases in the downtime which take longer and how do you plan in such a way that your upgrade doesn't impact the performance or user experience etc..
Thanks a lot nagarajan,
I had limited the scope of the blog to discuss on the count to give to processes with the resources we have (Space, CPU, RAM etc.). For overall tuning, we have a lot : Use ICNV to reduce PARCONV_UPG, use nZDM to shift AIMs from Downtime to uptime and frankly have'nt explored al in detail a lot to provide expert comments. 🙂
Will be writing a more on the same as I gain more foothold into the understanding and broad aspect of that kind of tuning.
Good Blog !!! Thanks for sharing it.
Could you please specific on the term CPU you used here , is it no of CPU cores per processor or the processor you mean .
We are on AIX. The Count is exact number of processors in lparstat/prtconf. 1 Processor=1 Core in AIX. So, These are the core count.
for 5. R3Load processes, by experience it is CPU x 3-5.
best is to make some test with 3 and then go to 5.
The CPU idle should be more than 20 if you want to be safe.
I would suggest to monitor the system with Unix team or yourself if you know the commands. if you feel that the system has plenty of resource, put a break point. it will stop at next phase. then you can modify your initial selection.
cd <update directory>/abap/bin
SAPup set procpar gt=scroll
Great observations! Thanks for sharing the results of your research and experimentation.
For the question on number of R3load processes, you might be interested in my own experience with tuning the number of parallel imports for system copy -- this is basically the same as the number of R3load processes. I was working on SQL Server, but I think this concept is much the same. I wrote about it at System Copy and Migration Observations. It's about halfway down the blog.
Meanwhile, though, I'm going to put some of your suggestions to practice on my next upgrade, as I'm always struggling to figure out the best tuning of these parameters.
Always a delight hearing from the SCN veterans like yourself. Currently my SCN ID is in a mess after Job change therefore wasn't able to check and reply sooner.
I would study your blog for sure on R3load. 🙂
We are utilizing DMO functionality of SUM tool to migrate and upgrade to HANA DB and running on sandbox system.
Source System : HP-UX/Oracle/ERP EHP6
Target System: Linux/HANA/ERP EHP7
Currently we are in downtime phase and running TABIM_UPG steps , this steps is running for more than 48 hours now and only 50% is done .
We are assigning R3trans process in the range of 30-100 , however there is not much progress. We have followed below SAP Notes and adapted R3trans process , verified I/O/CPU/RAM of the source and target system and found within limits.
1616401 - Understanding parallelism during the Upgrades, EHPs and Support Packages implementations
1127194 - R3trans import with parallel processes
1223360 - Composite SAP Note: Performance optimization during import
1945399 - performance analysis for SHADOW_IMPORT_INC and TABIM_UPG phase
One point of mention is at any point of given there are only max of 2 tp process are running and there is no control given to increase them . We have explored TABIMUPG.TPP file , this file is being updated automatically (parameter mainimp_proc) by tp tool with every new group of packages .
Unfortunately link http://wiki.scn.sap.com/wiki/display/ERP6/Performance+during+upgrade+phase+TABIMUPG is not working as well .
Any advice will be helpful.
Did you ever get the runtime down for TABIM_UPG in this scenario? We are seeing similar behavior and exploring options.
Did you resolve your issue or wait out the phase?
That was quite a while ago... 🙂
When the above inquiry was made, we were performing a test migration in a sandbox environment. Over the course of the project, we took several steps to improve the performance of various phases of the migration.
Thanks for trying to remember! Yes, this issue was a long time ago, but still applies to upgrading 7.02 systems.
For anyone that crosses this issue, you can resolve it by note 1279597. This note also indicates to use a breakpoint, but the breakpoint doesn't work, so I just stopped the upgrade and applied the note and the restarted the upgrade. Its a good thing, at this point, it appears it has not alter the database enough, that I was still able to start the SAP instance to to apply the note.
The instructions in the note apply to older tools.
For SUM, the breakpoint can be set either via the SL Common UI or the via SAPup (e.g. ./SAPup stop <phase>).
I am wondering if you where able to resolve this back then or did you wait out the phase and if so how long did it took?
I understand this was a few years back, but it is still relevant for upgrades.
Is there a limit of max no R3trans and R3load processes?
e.g for 500 CPU , can we use 500*3 = 1500 R3load processes?
and R3trans , 500 *1.5 = 750
Is there any upper limit for R3trans and R3load processes?
No, the SUM does not set a limit for the number of processes. Regards, Boris
I understand SUM does not restrict providing no of processes.
My question is there logical / practical limitation to provide no of R3trans and R3load processes?
for 500 CPU , can we use 500*3 = 1500 R3load processes?
and R3trans , 500 *1.5 = 750
Can we use 1500 r3load and 750 r3trans processes? Will that be helpful.
thanks, now I see your point.
For non-DMO scenarios, both parameters can be around 20. For R3trans, a higher number can have a negative effect on the runtime.
For DMO, the number of R3load processes is very important for the downtime, so it has to be adapted to the performance of the application server, see this blog. We have seen DMO runs with more than 700 R3loads.
Best regards, Boris
as each R3load process is working on its own (part of) a table strong parallelization makes sense for R3load as long as I/O can keep up.
R3trans however constantly reads and writes to internal tables (e.g. E071, TRDIR, ...) so that there is the potential risk of serialization due to locks on table entries. With 750 parallel R3trans processes you will be for sure be in that range (we have seen it already with 100 parallel processes). The respective SAP note 1127194 suggests to start with 2 and slowly increase 😉 So you should have a look on that.
I have some more doubts
1. While configuring ( calculating ) R3load processes , should we consider CPUS of PAS [ application server ] instead of HANA server and source DB server as R3load are executing from PAS?
2. Also Generally HANA DB server will be powerful than source DB server, should we consider import to be faster compared to export ?
1) the configured R3load processes will run on the host on which the SUM is running (typically Primary Application Server PAS host), so first of all the performance / CPUs of this server has to be considered. For DMO, you will have to monitor the overall network throughput as well, as this is influenced by all factors (like source DB performance).
2) My expectation is as well that the HANA server is more powerful - but what would be the consequence of this assumption?
Best regards, Boris
Thanks for this blog, it really helped to understand a bit more this area within the upgrade configuration phase.