Emil Pilecki on
The Problem: database restore fails with ORA-19571: datafile copy RECID xxx STAMP yyy not found in control file
Our typical setup of Oracle databases consists of a primary RAC cluster along with a standby database, also in RAC configuration. We are taking RMAN database backups from standby, while archivelog are backed up from primary database. Typically we are backing up everything to DISK (NAS), and further transferring some backups to TAPE. We are also running regular automated recoveries to test our backups.
For one of our systems we had so far only TAPE backups configured, and we recently migrated to the standard solution with DISK+TAPE. At the same time we have also started doing automated recoveries from DISK, but suddenly, a surprise - the first attempt of a recovery from DISK based backups fails with rather unclear error:
RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03002: failure of restore command at 03/31/2016 11:22:18 ORA-19571: datafile copy RECID 410580 STAMP 847973856 not found in control file ORA-19600: input file is datafile-copy 410580 () ORA-19601: output file is datafile-copy 0 (/ORA/dbs03/oradata/ARTO/ARTO_STDBY/datafile/o1_mf_urt_test_%u_.dbf)
We have checked the file mentioned here and it is perfectly OK in control files (on primary and standby) - it is a read only file not touched since 2009. The file was also correctly backed up to DISK a few days before this recovery, so what is going on? In this particular case neither MOS nor Google was very helpful so we had to dig for the root cause ourselves...
The Cause: Old datafile copies were clogging the RMAN catalogue and control files for device type DISK
After investigation we have found that some very old copies of certain datafiles were registered in RMAN catalogue, and similarly some old copies were also registered in the control files of both primary and standby! Those copies were non-existent, but entries in the catalogue and control files were still present, causing the recovery to fail with this quite cryptic error message as above.
The copies of datafiles can be listed in RMAN (it is important to connect twice, with and without the catalogue):
rman target / catalog cat_schema@rmandb RMAN> set backup files for device type disk to accessible; RMAN> list copy of database device type disk; rman target / RMAN> set backup files for device type disk to accessible; RMAN> list copy of database device type disk;
The Solution: Clean-up the catalogue and control files by removing old datafile copy entries
Luckily it is easy to remove the old entries from RMAN catalogue and the control files!
For this we have to connect to the primary and standby database, with and without catalogue, and execute the crosscheck + delete operations:
# On Primary DB
rman target / catalog cat_schema@rmandb RMAN> set backup files for device type disk to accessible; RMAN> crosscheck copy of database device type disk; allocated channel: ORA_DISK_1 channel ORA_DISK_1: SID=331 device type=DISK allocated channel: ORA_DISK_2 channel ORA_DISK_2: SID=346 device type=DISK validation failed for datafile copy datafile copy file name=+ARTO_RECODG1/ARTO/datafile/system.1509.746474995 RECID=406141 STAMP=773007377 validation failed for datafile copy datafile copy file name=+ARTO_RECODG1/ARTO/datafile/undotbs1.2063.746467055 RECID=406372 STAMP=773007769 ... validation failed for datafile copy datafile copy file name=+ARTO_RECODG1/ARTO/datafile/urt_ecal_laser_cond_2012_data.1012.771624669 RECID=402790 STAMP=772316084 Crosschecked 268 objects RMAN> delete expired copy of database device type disk; allocated channel: ORA_DISK_1 channel ORA_DISK_1: SID=331 device type=DISK allocated channel: ORA_DISK_2 channel ORA_DISK_2: SID=346 device type=DISK List of Datafile Copies ======================= Key File S Completion Time Ckp SCN Ckp Time ------- ---- - -------------------- ---------- -------------------- 406141 1 X 19-JAN-2012 20:16:17 6225426667688 17-JAN-2012 20:09:25 Name: +ARTO_RECODG1/ARTO/datafile/system.1509.746474995 Tag: ARTO 406372 2 X 19-JAN-2012 20:22:49 6225427503587 17-JAN-2012 20:15:32 Name: +ARTO_RECODG1/ARTO/datafile/undotbs1.2063.746467055 Tag: ARTO ... 406278 417 X 19-JAN-2012 20:19:28 6225427131862 17-JAN-2012 20:12:04 Name: +ARTO_RECODG1/ARTO/datafile/urt_ecal_laser_cond_2012_data.2714.772198015 Tag: ARTO 402790 417 X 11-JAN-2012 20:14:44 6223745220545 09-JAN-2012 20:11:42 Name: +ARTO_RECODG1/ARTO/datafile/urt_ecal_laser_cond_2012_data.1012.771624669 Tag: ARTO Do you really want to delete the above objects (enter YES or NO)? YES deleted datafile copy datafile copy file name=+ARTO_RECODG1/ARTO/datafile/system.1509.746474995 RECID=406141 STAMP=773007377 deleted datafile copy datafile copy file name=+ARTO_RECODG1/ARTO/datafile/undotbs1.2063.746467055 RECID=406372 STAMP=773007769 ... deleted datafile copy datafile copy file name=+ARTO_RECODG1/ARTO/datafile/urt_ecal_laser_cond_2012_data.1012.771624669 RECID=402790 STAMP=772316084 Deleted 268 EXPIRED objects rman target / RMAN> set backup files for device type disk to accessible; RMAN> crosscheck copy of database device type disk; RMAN> delete expired copy of database device type disk;
# On Standby DB
rman target / catalog cat_schema@rmandb RMAN> set backup files for device type disk to accessible; RMAN> crosscheck copy of database device type disk; RMAN> delete expired copy of database device type disk; rman target / RMAN> set backup files for device type disk to accessible; RMAN> crosscheck copy of database device type disk; RMAN> delete expired copy of database device type disk;
And the recovery is now working again!
Special thanks to Miroslav Potocky for his help in diagnosing this problem!
Add new comment