Datafile copy not found in control file during RMAN recovery

The Problem: database restore fails with ORA-19571: datafile copy RECID xxx STAMP yyy not found in control file

Our typical setup of Oracle databases consists of a primary RAC cluster along with a standby database, also in RAC configuration. We are taking RMAN database backups from standby, while archivelog are backed up from primary database. Typically we are backing up everything to DISK (NAS), and further transferring some backups to TAPE. We are also running regular automated recoveries to test our backups.

For one of our systems we had so far only TAPE backups configured, and we recently migrated to the standard solution with DISK+TAPE. At the same time we have also started doing automated recoveries from DISK, but suddenly, a surprise - the first attempt of a recovery from DISK based backups fails with rather unclear error:

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of restore command at 03/31/2016 11:22:18
ORA-19571: datafile copy RECID 410580 STAMP 847973856 not found in control file
ORA-19600: input file is datafile-copy 410580 ()
ORA-19601: output file is datafile-copy 0 (/ORA/dbs03/oradata/ARTO/ARTO_STDBY/datafile/o1_mf_urt_test_%u_.dbf)

We have checked the file mentioned here and it is perfectly OK in control files (on primary and standby) - it is a read only file not touched since 2009. The file was also correctly backed up to DISK a few days before this recovery, so what is going on? In this particular case neither MOS nor Google was very helpful so we had to dig for the root cause ourselves...

The Cause: Old datafile copies were clogging the RMAN catalogue and control files for device type DISK

After investigation we have found that some very old copies of certain datafiles were registered in RMAN catalogue, and similarly some old copies were also registered in the control files of both primary and standby! Those copies were non-existent, but entries in the catalogue and control files were still present, causing the recovery to fail with this quite cryptic error message as above.

The copies of datafiles can be listed in RMAN (it is important to connect twice, with and without the catalogue):

rman target / catalog cat_schema@rmandb
RMAN> set backup files for device type disk to accessible;
RMAN> list copy of database device type disk;

rman target /
RMAN> set backup files for device type disk to accessible;
RMAN> list copy of database device type disk;

The Solution: Clean-up the catalogue and control files by removing old datafile copy entries

Luckily it is easy to remove the old entries from RMAN catalogue and the control files!

For this we have to connect to the primary and standby database, with and without catalogue, and execute the crosscheck + delete operations:

# On Primary DB

rman target / catalog cat_schema@rmandb

RMAN> set backup files for device type disk to accessible;
RMAN> crosscheck copy of database device type disk;

allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=331 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=346 device type=DISK
validation failed for datafile copy
datafile copy file name=+ARTO_RECODG1/ARTO/datafile/system.1509.746474995 RECID=406141 STAMP=773007377
validation failed for datafile copy
datafile copy file name=+ARTO_RECODG1/ARTO/datafile/undotbs1.2063.746467055 RECID=406372 STAMP=773007769
...
validation failed for datafile copy
datafile copy file name=+ARTO_RECODG1/ARTO/datafile/urt_ecal_laser_cond_2012_data.1012.771624669 RECID=402790 STAMP=772316084
Crosschecked 268 objects

RMAN> delete expired copy of database device type disk;

allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=331 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=346 device type=DISK

List of Datafile Copies
=======================

Key     File S Completion Time      Ckp SCN    Ckp Time
------- ---- - -------------------- ---------- --------------------
406141  1    X 19-JAN-2012 20:16:17 6225426667688 17-JAN-2012 20:09:25
        Name: +ARTO_RECODG1/ARTO/datafile/system.1509.746474995
        Tag: ARTO

406372  2    X 19-JAN-2012 20:22:49 6225427503587 17-JAN-2012 20:15:32
        Name: +ARTO_RECODG1/ARTO/datafile/undotbs1.2063.746467055
        Tag: ARTO
...
406278  417  X 19-JAN-2012 20:19:28 6225427131862 17-JAN-2012 20:12:04
        Name: +ARTO_RECODG1/ARTO/datafile/urt_ecal_laser_cond_2012_data.2714.772198015
        Tag: ARTO

402790  417  X 11-JAN-2012 20:14:44 6223745220545 09-JAN-2012 20:11:42
        Name: +ARTO_RECODG1/ARTO/datafile/urt_ecal_laser_cond_2012_data.1012.771624669
        Tag: ARTO

Do you really want to delete the above objects (enter YES or NO)? YES

deleted datafile copy
datafile copy file name=+ARTO_RECODG1/ARTO/datafile/system.1509.746474995 RECID=406141 STAMP=773007377
deleted datafile copy
datafile copy file name=+ARTO_RECODG1/ARTO/datafile/undotbs1.2063.746467055 RECID=406372 STAMP=773007769
...
deleted datafile copy
datafile copy file name=+ARTO_RECODG1/ARTO/datafile/urt_ecal_laser_cond_2012_data.1012.771624669 RECID=402790 STAMP=772316084
Deleted 268 EXPIRED objects


rman target /
RMAN> set backup files for device type disk to accessible;
RMAN> crosscheck copy of database device type disk;
RMAN> delete expired copy of database device type disk;

# On Standby DB

rman target / catalog cat_schema@rmandb
RMAN> set backup files for device type disk to accessible;
RMAN> crosscheck copy of database device type disk;
RMAN> delete expired copy of database device type disk;

rman target /
RMAN> set backup files for device type disk to accessible;
RMAN> crosscheck copy of database device type disk;
RMAN> delete expired copy of database device type disk;

And the recovery is now working again!

Special thanks to Miroslav Potocky for his help in diagnosing this problem!

Add new comment

You are here