Skip to Content

Introduction

The last week i was working on client site to implement a simplified Oracle database copy with RMAN Duplicate. The backup itself was performed in parallel with BR*Tools, RMAN and Tivoli Data Protection for mySAP. However the third party backup product does not matter in this case, because of you can run into this issue with TDP for mySAP or TDP for Oracle. After i have configured and implemented the RMAN Duplicate scenario, i tried to duplicate (basically it is a restore with further activities) the database in parallel too. But the restore procedure used one channel only even if two or more were defined and allocated. So the restore procedure was not running in parallel.

Unfortunately the SAP BR*Tools are not considering the RMAN default configuration at all, so the configuration for each channel need to be applied by every backup or restore.

TSM (server) configuration

The TSM configuration looked basically like the following graph.

/wp-content/uploads/2012/07/ab0ct137_121899.gif

There was a management class for the database backups which points to a disk pool first. After the disk pool has reached its migration threshold the data was migrated to tape automatically. So the database backup run in parallel and creates multiple backup pieces on the disk pool first and after some time these backup pieces are located on tape. The migration itself is completely transparent for Oracle RMAN.

As long as you perform backups only there is no problem at all, but what happens if some backups need to be restored?

The data migration procedure kicks in, if the threshold is reached and the backup pieces (which were written in parallel) could be placed on the same tape. From that point there is no possibility to restore the data in parallel, if all (or most) of the backup pieces are stored on the same tape. RMAN itself allocates multiple channels, but uses only one to restore the database.

The problem is, that you don’t get any notice by that, because of the BR*Tools are hiding the channel information completely and even if you are using native RMAN, you don’t get any error. You will see one channel working only.

RMAN Duplicate (or restore)

Let’s take a look at the problem in reality. The following RMAN commands were used to duplicate the database, but the behavior is also the same by performing a simple restore of the database with native RMAN or with BR*Tools.

shell> rman target / auxiliary SYS/<PASS>@<SID>
RMAN> run {
allocate auxiliary channel c1 device type SBT
 parms 'ENV=(XINT_PROFILE=/oracle/<SID>/112_64/dbs/init<SID>.utl,PROLE_PORT=57323,BR_CALLER=NONE,BR_BACKUP=NONE, BR_REQUEST=NONE,BR_RUN=none.svd)';
allocate auxiliary channel c2 device type SBT
parms 'ENV=(XINT_PROFILE=/oracle/<SID>/112_64/dbs/init<SID>.utl,PROLE_PORT=57323,BR_CALLER=NONE,BR_BACKUP=NONE, BR_REQUEST=NONE,BR_RUN=none.svd)';
set until time "to_date('23-07-2012 08:00:00','DD-MM-YYYY HH24:MI:SS')";
duplicate target database to <SID>; }

After the restore started you could see one channel working only even two channels were specified and allocated. After some researches, i enabled RMAN debugging to get some more information about each channel and found the following.

RMAN> debug on
....
BGMISC:                  channel c1 locked media [TSM_tape_<TAPE_NUMBER>] [16:08:45.112] (krmqgns)
DBGMISC:                  channel c2 could not lock media [TSM_tape_<TAPE_NUMBER>] [16:08:45.113] (krmqgns)
DBGMISC:                  channel c2 could not lock media [TSM_tape_<TAPE_NUMBER>] [16:08:45.113] (krmqgns)
DBGMISC:                  channel c2 could not lock media [TSM_tape_<TAPE_NUMBER>] [16:08:45.113] (krmqgns)
DBGMISC:                  channel c2 could not lock media [TSM_tape_<TAPE_NUMBER>] [16:08:45.232] (krmqgns)

You can see the described scenario in the trace file now. There are multiple backup pieces, but the second channel could not start its work due to media (=tape) lock failure. You also get the needed tape number <TAPE_NUMBER> from the trace file which can be verified on the TSM server itself.

** TDP for mySAP **
TSM>  select * from archives where node_name = '<NODENAME>' and ll_name = '<TDP_BACKUP_ID>';
TSM> show bfo <OBJECT_ID>
** TDP for Oracle **
TSM> select * from backups where node_name = '<NODENAME>' and ll_name = '<BACKUP_PIECE_NAME>';
TSM> show bfo 0 <OBJECT_ID>

Summary

As you can see it is very important to test and validate the complete backup and restore process to avoid such problems in an emergency case. Just think about the lost (restore) time for large databases.

There are various solutions to fix this issue of course – here are a few examples:

  1. Do not use the disk pools on TSM server for database backups (parallel backup directly to different tapes) or use VTL (Virtual Tape Library)
  2. Specify a backup pool for each channel (not possible due to BR*Tools configuration limits – you need to use native RMAN)
  3. Use FRA as primary backup location (acts like a disk pool on TSM server) and migrate your backups directly to tape afterwards (native RMAN)

If you have any further questions – please feel free to ask.

To report this post you need to login first.

Be the first to leave a comment

You must be Logged on to comment or reply to a post.

Leave a Reply