Some time ago I had a support case that centered around the question whether ‘shutdown abort’ has any consequences that could negativly impact the SAP system (apart from the fact that it is a shutdown of course):
Due to a hanging ‘shutdown immediate’ a ‘shutdown abort’ was considered to quickly resolve the situation and the customer wanted to know whether this could anyhow harm his system. SAP was stuck during release upgrade, so this was a time critical issue. Of course, the way Oracle works ensures that the DB is consistent after the following restart, but nevertheless, you should keep your fingers crossed everytime you do this due to the reasons/mechanisms described here.
When the DB is restarting after the shutdown abort, two steps are done:
of the change vectors in the online redo log starting at the last checkpoint. Usually, this does not take that long, so it is not a problem when this is done BEFORE the DB is opened. Of course, in case you have corruptions in the online redo log, you really have a problem now. There are possibilities to skip the application of the redo entries in emergency cases, but this will cause corruptions in the DB which can take a considerable effort to fix. Unfortunatly, only use of a standby DB can really ensure that the redo is fine.
Deferred Transaction Rollback
takes place AFTER the DB was opened. This is actually a good thing because in case a
transaction was running for several hours when the abort happened, rollback will take quite some time. In this case, the rollback will be done by SMON and depending on the setting of the parameter FAST_START_PARALLEL_ROLLBACK, SMON will use multiple parallel processes for this.
What happens in case of corrupted undo? SMON will most likely crash and immediatly afterwards, the DB instance too. However, a certain event exists to avoid the clean up initiated by SMON so that the DB can at least stay open and we can find strategies to deal with the affected objects. Of course: leaving the system in this state has some serious implications (will be mentioned later).
Consequences of long running transaction rollback
Deferred Transaction Rollback sounds like a great feature. After the shutdown abort we can immediatly resume productive operation of the SAP system, right? Well, we can try, but we have to be aware of the following consequences.
- CPU usage:
On the default setting, the rollback can use up to 2 * CPU_COUNT processes which can put quite a load on the system.
- Undo contention
A worst case would be if after the shutdown abort, concurrent DML (for example by parallel processes) is done on the object for which the transaction rollback is still in progress. We could then run into ‘enq: US’ waits.
- Deterioration of read performance
We also have a problem when we start massivly reading against the object for which the changes are rolled back.
Those (consistent) reads will be directed to the undo tablespace for which we might have fewer IO ressources allocated compared to the data files of the “regular” SAP tablespaces. Consequently, we see the ‘read by other session’ wait event here in case of a high number of concurrent reads.
These are also the reasons why you for sure do not want to have the smon cleanup disabled by special events while the SAP system is productivly used.
Things to be considered BEFORE a shutdown immediate to avoid a shutdown abort and rollback problems
- Follow the steps outlined metalink notes 375935.1 and 117316.1 to check whether a rollback will occur at the time of the shutdown abort and how long it will take.
- If the rollback will potentially take some time, consider to manually initiate the rollback in a time of low system activity. Just kill the affected shadow process manually and SMON will start parallel rollback without the need of ‘shutdown abort’. The progress of the parallel rollback can then be monitored via the statement
select undoblockstotal, undoblocksdone, state, undoblocksdone / undoblockstotal * 100 as percentage from v$fast_start_transactions;
The ‘shutdown immediate’ has a ‘shutdown timeout’ of 60 minutes so you might be able to reconsider the situation after this period of time, however, from my experience, this does not seem to work for all kinds of blocking situations, you may still end up with waiting ‘forever’ for the end of the shutdown.
To sum it up: After the DB is restarted, you should at least make sure that the processes that were causing the long running transactions are not immediatly rescheduled in order to avoid contention problems. From a technical point of view, only consistent redo and undo unformation can guarantee that the instance startup works as expected, but it might now always be possible (also because of time contstraints) to verify this.