Trial Cutover 2 (TC2) completed and the lessons we learnt from it (CUUC Part 6)
Christmas was long over, the January blues were closing the month and I had just spent another weekend at the Hilton at the Birmingham NEC performing TC2 – which went very well. So I have decided to spare you the details of the process – instead I wanted to document how the analysis of TC1 influenced the upgrade process and if those tweaks worked.
So what did we ‘tweak’ between TC1 and TC2? The table below details the items discussed in the last post and the updates from TC2
As you can see from the table above we had several changes to how we executed processes as opposed to changing the processes themselves – this was to provide the consistency that is necessary when coming to the later stages of upgrade projects.
In terms of timings the tables below show how TC2 ran compared to TC1 and the PoC CUUC process, and how our mitigation techniques worked for us, especially on the Unicode conversion
So as you can see the overall time has come down quite dramatically in both the Upgrade Uptime and the Unicode phases, this is primarily due to the index defragmentation done on the database over Christmas and our own improvements in running tasks.
As well as technical measures to improve performance we also implemented a number of soft measures around people and upgrade management.
- The shift patterns were improved, we found that the other upgrades were running at paces which were not accurately reflected in the plans – this caused problems for the team as I wanted each person to have a ‘buddy’ to check upgrade inputs and reduce errors.
- The previous post talked about the Cut over communications, these were difficult over TC1 because the shifts were not optimal. After reworking the shifts we found that the communication methods worked quite well, but communicating with a geographically distributed team across timezones is not easy with so much at stake.
- SAP Streamwork was a very useful repository for the upgrade documentation, the team was able to keep their documentation up to date and available to the whole project team. I also put a daily upload of the project plan on Streamworks so my team could keep up to date with the plan revisions as they did not have file system access to the client.
- The technical team did a project plan walk through, counting out each hour and detailing the tasks they would perform, this is difficult one to call as to it’s usefulness. My team found it very useful but it does take about 4-5 hours to get through, on simpler projects I am not convinced of it’s value.
Coming out of TC1 and TC2 was a recurring issue around both backups and the fall back scenario, both of which are vital for a customer to ensure they can Return to Operation (RTO) as quickly as possible. The main issue that we had going into TC2 was the actual scheduling of the post-Unicode backup, this is a vital checkpoint as the system has to be re-introduced into the backup schedule as soon as possible and also needs to capture all the changes of the Upgrade, Unicode and transports. In order to do this we decided that we would run an On-line backup of the system as soon as the transports were imported and before an SGEN was executed. As a result the backup would be running whilst the users were testing the system and because an SGEN had not been run they would experience degraded performance, this was going to be something they would have to live with but creating a recovery point for the system was more important than a temporary performance issue.
The fallback scenario was another challenge, during the project the client had a requirement for an additional Pre-Prod environment to allow testing of a seperate project stream which was falling behind and would not go-live with the other work streams. At the time, providing this system was a major pain and caused a great deal of stress, but it would become a great opportunity for the fall back. The diagrams below shows how we used the additional Pre-Prod system to restore the Pre-Upgrade backup on to it and in the event of a fall back re-present the storage to return to the previous version.
In the event of invoking a fallback, the process would change the SAN presentation of disks to re-present the restored database back to the Production source and allow a much shorter RTO than a straight restore from tape.
Now that we had the database backups and fall scenario constructed, all that was left to do at this point was to get ready for Production in a few weeks time.