Even for huge SAP systems: Downtimes can be avoided when updating AIX
The validation of the AIX “Live Update” feature with SAP has now been extended from small- and medium-sized to huge SAP systems.
For SAP Installations on AIX, based on ABAP stack, the updates of AIX Technology Levels (TL) and Service Packs (SP) can be pursued without downtime, utilizing the “AIX Live Update” feature. This holds from versions AIX release 7.2 TL 3 SP 1 and from SAP NetWeaver 7.5 on. This has been validated by the IBM AIX platform team, and documented in the IBM Whitepaper “SAP Applications with AIX Live Update”, with a set of best practice recommendations, available under https://www.ibm.com/support/pages/node/6355823.
Some preconditions are to be met, as having temporary CPU and memory resources available on the physical server, and to add two or three additional disks to the logical partition (LPAR) to be updated. The additional disks are required to keep clones of the basic operating system image, called “rootvg”, which holds the operating system’s base file systems. At least two additional disks are required, or three if a specific configuration is chosen, that enables further Live Update operations without need for configuration adaptions.
After starting the Live Update procedure, first some preparations will happen, which will take time in the order of an hour or less, where the AIX LPAR is fully available. Finally the Live Update culminates in a short time of unavailability, called blackout period, where memory pages are transferred. For small- to medium-sized systems that time would typically be in the order of a minute, for huge systems it can be more, like ten minutes. Thorough scaling tests have shown that also for huge systems all SAP processes will be fully available after the blackout period, with an updated AIX environment.
Technically the original LPAR is cloned to another, which gets updated, and finally all running entities get transferred from the original to the second by memory transfer. This happens under the hood on the machine level, transparent to all running applications. From application point of view nothing else happens but a short time of freeze. All operating system entities as process-IDs, file descriptors, sockets, internet addresses remain the same.
An example is shown in the following, taken from the whitepaper chapter “Example AIX Live Update run”.
- At the beginning the AIX system level was shown as “7200-03-01-1838”. This denotes AIX release 7.2, technology level 3, service pack 1.The last element “1838” is of less relevance, telling the year and week where that version has been released.
(0)root @ xxxx01: / # oslevel -s 7200-03-01-1838
- As preparation the LPARs administrative “root” user authenticated to the hardware management console (HMC) using the “hmcauth” command, with some credentials. This is a precondition to run the AIX Live Update
(0) root @ xxxx01: / # hmcauth -a x.x.x.x -u hscroot -p xxxxxxxx
- A configuration file “/var/adm/ras/liveupdate/lvupdate.data” is always required to be populated with some parameters, most importantly the disks for cloning the operating system image (rootvg), as in following example.
general: kext_check = no cpu_reduction = no disks: nhdisk = hdisk1 mhdisk = hdisk2 tohdisk = tshdisk = alt_nhdisk = hdisk3
- The actual AIX Live Update run was started using the “geninstall” command. The argument “-d /mnt” tells the location of the new AIX release files, here the ones for AIX release 7.2, technology level 3, service pack 2. The date commands were used as an easy way to get a timestamp at the start and the end of the Live Update run.
(0)root @ xxxx01: / # date; geninstall -k -d /mnt update_all; date
Sun Aug 16 04:21:34 CDT 2020 Validating live update input data. Computing the estimated time for the live update operation: ------------------------------------------------------- LPAR: xxxx01.x.x.x.x Blackout time(in seconds): 122 Total operation time(in seconds): 1074 Checking mirror vg device size: ------------------------------------------ Required device size: 24320 MB Given device size: 40959 MB PASSED: device size is sufficient. … PASSED: Managed System state is operating. INFO: Any system dumps present in the current dump logical volumes will not be available after live update is complete. Non-interruptable live update operation begins in 10 seconds. Initializing live update on original LPAR. Validating original LPAR environment. Beginning live update operation on original LPAR. … Blackout Time started. Blackout Time end. Workload is running on surrogate LPAR. .................................................................................................................... Shutting down the Original LPAR. ............................ The live update operation succeeded. ... Sun Aug 16 04:53:45 CDT 2020
- After the AIX Live Update run had finished the AIX operating system level was shown as 7200-03-02-1846, i.e. the upgrade by one service pack was successful.
(0) root @ xxxx01: / # oslevel -s 7200-03-02-1846
- The actual duration of the blackout time shown by the “alog” command was 41 seconds. Further it showed the time used for the overall Live Update procedure, which was 1215 seconds, or 20 minutes and 25 seconds.
(0) root @ xxxx01: /# alog -t mobte -o | tail -1 time=081620:04:53:32 pid=23363239110770689 ... blackout=41.079693s global=1215.240479s
SAP installations based on ABAP stack running on the AIX platform can have their AIX Technology Level or Service Pack upgraded without downtime, with only minimal impact of a short time of freeze, usually in the order of a minute, or multiple minutes for huge SAP systems.