Technical Articles
Even for huge SAP systems: Downtimes can be avoided when updating AIX
The validation of the AIX “Live Update” feature with SAP has now been extended from small- and medium-sized to huge SAP systems.
For SAP Installations on AIX, based on ABAP stack, the updates of AIX Technology Levels (TL) and Service Packs (SP) can be pursued without downtime, utilizing the “AIX Live Update” feature. This holds from versions AIX release 7.2 TL 3 SP 1 and from SAP NetWeaver 7.5 on. This has been validated by the IBM AIX platform team, and documented in the IBM Whitepaper “SAP Applications with AIX Live Update”, with a set of best practice recommendations, available under https://www.ibm.com/support/pages/node/6355823.
Some preconditions are to be met, as having temporary CPU and memory resources available on the physical server, and to add two or three additional disks to the logical partition (LPAR) to be updated. The additional disks are required to keep clones of the basic operating system image, called “rootvg”, which holds the operating system’s base file systems. At least two additional disks are required, or three if a specific configuration is chosen, that enables further Live Update operations without need for configuration adaptions.
After starting the Live Update procedure, first some preparations will happen, which will take time in the order of an hour or less, where the AIX LPAR is fully available. Finally the Live Update culminates in a short time of unavailability, called blackout period, where memory pages are transferred. For small- to medium-sized systems that time would typically be in the order of a minute, for huge systems it can be more, like ten minutes. Thorough scaling tests have shown that also for huge systems all SAP processes will be fully available after the blackout period, with an updated AIX environment.
Technically the original LPAR is cloned to another, which gets updated, and finally all running entities get transferred from the original to the second by memory transfer. This happens under the hood on the machine level, transparent to all running applications. From application point of view nothing else happens but a short time of freeze. All operating system entities as process-IDs, file descriptors, sockets, internet addresses remain the same.
Example
An example is shown in the following, taken from the whitepaper chapter “Example AIX Live Update run”.
- At the beginning the AIX system level was shown as “7200-03-01-1838”. This denotes AIX release 7.2, technology level 3, service pack 1.The last element “1838” is of less relevance, telling the year and week where that version has been released.
(0)root @ xxxx01: / # oslevel -s 7200-03-01-1838
- As preparation the LPARs administrative “root” user authenticated to the hardware management console (HMC) using the “hmcauth” command, with some credentials. This is a precondition to run the AIX Live Update
(0) root @ xxxx01: / # hmcauth -a x.x.x.x -u hscroot -p xxxxxxxx
- A configuration file “/var/adm/ras/liveupdate/lvupdate.data” is always required to be populated with some parameters, most importantly the disks for cloning the operating system image (rootvg), as in following example.
general: kext_check = no cpu_reduction = no disks: nhdisk = hdisk1 mhdisk = hdisk2 tohdisk = tshdisk = alt_nhdisk = hdisk3
- The actual AIX Live Update run was started using the “geninstall” command. The argument “-d /mnt” tells the location of the new AIX release files, here the ones for AIX release 7.2, technology level 3, service pack 2. The date commands were used as an easy way to get a timestamp at the start and the end of the Live Update run.
(0)root @ xxxx01: / # date; geninstall -k -d /mnt update_all; date
Sun Aug 16 04:21:34 CDT 2020 Validating live update input data. Computing the estimated time for the live update operation: ------------------------------------------------------- LPAR: xxxx01.x.x.x.x Blackout time(in seconds): 122 Total operation time(in seconds): 1074 Checking mirror vg device size: ------------------------------------------ Required device size: 24320 MB Given device size: 40959 MB PASSED: device size is sufficient. … PASSED: Managed System state is operating. INFO: Any system dumps present in the current dump logical volumes will not be available after live update is complete. Non-interruptable live update operation begins in 10 seconds. Initializing live update on original LPAR. Validating original LPAR environment. Beginning live update operation on original LPAR. … Blackout Time started. Blackout Time end. Workload is running on surrogate LPAR. .................................................................................................................... Shutting down the Original LPAR. ............................ The live update operation succeeded. ... Sun Aug 16 04:53:45 CDT 2020
- After the AIX Live Update run had finished the AIX operating system level was shown as 7200-03-02-1846, i.e. the upgrade by one service pack was successful.
(0) root @ xxxx01: / # oslevel -s 7200-03-02-1846
- The actual duration of the blackout time shown by the “alog” command was 41 seconds. Further it showed the time used for the overall Live Update procedure, which was 1215 seconds, or 20 minutes and 25 seconds.
(0) root @ xxxx01: /# alog -t mobte -o | tail -1 time=081620:04:53:32 pid=23363239110770689 ... blackout=41.079693s global=1215.240479s
Conclusion
SAP installations based on ABAP stack running on the AIX platform can have their AIX Technology Level or Service Pack upgraded without downtime, with only minimal impact of a short time of freeze, usually in the order of a minute, or multiple minutes for huge SAP systems.
Cool !
It works also for Netweaver Java?
Thanks for your feedback Andreia!
Actually we focussed first on the validation of the ABAP stack based systems, since they build the majority of SAP installations. About a Java stack validation, that is currently under discussion.
May I ask whether your question about Java stack comes from general interest, or do you see a concrete case where AIX Live Update could be useful? Also it would be interesting if you know some kind of representative Java load to test.
Thanks & best regards,
Ralf
Thanks for the answer
In my landscape I have two SAP PO instances (one java + one abap) running on AIX 7.2 TL4. It will be very useful for me once I can move the OS of the ABAP stack to AIX 7.2 TL5 with no downtime.
Hello Andreia,
thanks for that informations!
I would like to encourage you for planning the AIX Live Update for the TL upgrade, regarding your ABAP system. A proper preparation, as outlined in our whitepaper, is essential, including building up some general understanding of the Live Update -- which is mainly a one time effort. Also the whole Live Update procedure should be played through first in a test or UAT environment, which anyway should be the quality approach for any complex change. A good starting point from technical perspective would be that you run the "prerequisite check" (geninstall -k -p command) which should show any obstacles, and gives an estimation of the blackout time.
Please note that AIX Live Update is fully supported from AIX OS-side, ie. IBM guarantees that all OS entities as processes, file handles, memory content etc. will be available and running after L.U. blackout period. The one crucial aspect from application-side is whether it can accept the short freeze time, e.g. it would not be suitable when real time responses would be essential.
For your PO system I would like to ask:
- the size -- which determines the length of freeze time.
- do you know the responsiveness demands when PO communicates to other systems ?
If you should plan Live Update, we fully support it from SAP on AIX perspective, so in case of issues you can raise an SAP case for customer systems (test and production) or internal SAP incidents for internal systems.
Best regards,
Ralf