Datafile without backups - how to restore?

Blog article:

Have you ever had a problem with restoring datafiles without any backups available? It's easy, of course if you have all archived logs from the time datafile was created. Please check it here: Re-Creating Data Files When Backups Are Unavailable. Moreover, RMAN is clever enough to create empty datafile automatically during restore phase and then recover it using archived logs. So far, so good, but...

Some time ago, for significant part of our backups, we've changed primary backup device from tapes to disks (located on NAS cluster providing around 1.2 PB of storage for that purpose, compressed with around 50% ratio). Tapes are now used as a kind of secondary backup device - every 4th full and all archived logs are sent there. In case we lose disk backups, we should be able to restore/recover everything from tapes (which of course will take much more time). After introducing the change, we've also modified our automatic recoveries strategy - we recover from disks, except every 4th automatic recovery, which is done from tapes. And during such tape recoveries, in case of one database we started to get errors like:

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ==============
RMAN-00571: ===========================================================
RMAN-03002: failure of restore command at 12/16/2014 12:40:51
RMAN-06026: some targets not found - aborting restore
RMAN-06100: no channel to restore a backup or copy of datafile 11

I wondered why it was not just created empty and recovered using archived logs? After short investigation, I found the cause - there was already one backup of this datafile, but only on disk, because it was not yet sent to tapes (remember, we're sending only every 4th full). So RMAN just checked that there is a backup (on disk), which cannot be restored, as we were using only tape channels. Instead of switching to a workaround described above, it just failed. I started looking for some elegant solution, like for example kind of SET command to tell RMAN not to check backups located on all device types, but only on specified ones, etc. Unfortunately, I couldn't find it. That's why I've decided to use a workaround, which is to look for backup pieces causing such error, make them unavailable, do the job and make them available again. Simply uncataloging them would be more dangerous, especially that some of our datafile backups are done on standby databases, with archived logs still being backed up from primary. For restoring such databases, we need to be connected to the RMAN catalog, which means that we should not uncatalog any backups and even after making backup pieces temporarily unavailable, we should make them available again.

Query to look for problematic backup pieces looks like this:

SQL> select distinct bpd.bp_key
     from v$backup_datafile_details bdd, v$backup_piece_details bpd
     where bdd.file# in 
        (select file#
         from (select bdd.file#, bsd.device_type
               from v$backup_datafile_details bdd, v$backup_set_details bsd
               where bdd.btype_key = bsd.bs_key
               and bsd.completion_time > (select creation_time from v$datafile where file# = bdd.file#)
               group by bdd.file#, bsd.device_type)
         group by file#
         having count(*) = 1)
     and bdd.btype_key = bpd.bs_key
     and bpd.device_type = 'DISK';

In case of the opposite situation - when you want to do restore from disk and some datafiles are backed up only to tapes (I've tested that the same error is generated in both cases), you'll have to change device type in above query.

After getting the list of problematic backup pieces, final solution is to put just before the restore:

RMAN> change backuppiece nnnn unavailable;

And then after the end of your script (you have to allocate proper channel type to do it):

RMAN> run {
         allocate channel d1 device type disk;
         change backuppiece nnnn available;
      }

Solution with allocating both types of channels was not an option for us - we want to test possibility to restore from tapes only and when both channel types are allocated, disk ones are given precedence, probably because they are (correctly in most cases) considered to be faster than tape ones.

I hope you see that this is another case when regular, automatic recoveries show their power - possibility to detect problem and analyse it without time pressure, which would not be the case if the error is seen for the first time during emergency production restore/recovery...

Tags

Oracle