I've already described how important is to test your backup strategy and restore/recovery procedures, but while doing so, you could of course encounter some problems, not really related with the recoverability as such. Recently, we've got such a problem on our recovery server, at the very beginning of an automatic restore (database name masked):
Running rman NOCATALOG @/ORA/dbs03/oradata/DBNAME/contrl_restore.rcv Recovery Manager: Release 11.2.0.4.0 - Production on Wed Apr 1 06:00:05 2015 Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved. RMAN> connect target * 2> set dbid=1064149547; 3> startup nomount pfile=/ORA/dbs03/oradata/DBNAME/initDBNAME.ora; 4> run { 5> set controlfile autobackup format for device type disk to '/backup/dbs01/DBNAME/DBNAME_%F'; 6> ALLOCATE CHANNEL d0 DEVICE TYPE disk; 7> restore controlfile from autobackup maxseq 130; 8> alter database mount; 9> } 10> exit; RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== ORA-01092: ORACLE instance terminated. Disconnection forced Recovery Manager complete.
I've commenced the investigation by trying to start the database manually, which ended with the same error. Unfortunately, there were no traces at all, apart from not very informative message in alert.log:
ORA-1092 : opitsk aborting process
I've got an idea, that the reason could be some junk left after previous recovery attempts, after which resources were not properly deallocated (for example shared memory segments). As recoveries of other databases were running on this machine at that time, I needed a way to check which shared resources are used by an old instance (if any), because output of ipcs command looked like:
$ ipcs -tm ------ Shared Memory Attach/Detach/Change Times -------- shmid owner attached detached changed 3932162 root Mar 16 12:40:32 Mar 16 12:40:32 Mar 16 12:40:32 3964932 root Mar 16 12:40:32 Mar 16 12:40:32 Mar 16 12:40:32 3997701 root Mar 16 12:40:32 Mar 16 12:40:32 Mar 16 12:40:32 1035337734 oracle Apr 1 12:25:44 Apr 1 12:25:44 Mar 31 21:18:04 1035370503 oracle Apr 1 12:25:44 Apr 1 12:25:44 Mar 31 21:18:04 1031634952 oracle Apr 1 12:25:20 Apr 1 12:25:20 Mar 31 13:46:26 1031667721 oracle Apr 1 12:25:20 Apr 1 12:25:20 Mar 31 13:46:26 1035403274 oracle Apr 1 12:25:44 Apr 1 12:25:44 Mar 31 21:18:04 1036582923 oracle Apr 1 12:21:12 Apr 1 12:21:12 Apr 1 12:00:33 1036615692 oracle Apr 1 12:21:12 Apr 1 12:21:12 Apr 1 12:00:33 928677903 oracle Apr 1 12:22:27 Apr 1 12:22:27 Mar 25 09:25:12 928710672 oracle Apr 1 12:22:27 Apr 1 12:22:27 Mar 25 09:25:12 928743443 oracle Apr 1 12:22:27 Apr 1 12:22:27 Mar 25 09:25:12
So how to check which ones may belong to the database instance you want to start? After setting properly $ORACLE_HOME and $ORACLE_SID variables, one could use Oracle provided tool: sysresv.
$ sysresv IPC Resources for ORACLE_SID "DBNAME1" : Shared Memory: ID KEY 928677903 0x00000000 928710672 0x00000000 928743443 0x5b2fed54 Semaphores: ID KEY No semaphore resources used Oracle Instance not alive for sid "DBNAME1"
Worth to mention that it is not necessary to set $ORACLE_SID, one could use also -l parameter of sysresv with the list of all interesting SIDs. Going back to our problem - I just needed to remove proper shared memory segments:
$ ipcrm -m 928677903 $ ipcrm -m 928710672 $ ipcrm -m 928743443
The other possibility to do the cleanup is to use sysresv command again, with option -f. Of course you have to be very careful while doing so on production systems, but fortunately it should not remove resources for instances detected as being "alive":
Oracle Instance alive for sid "DBNAME1" SYSRESV-005: Warning Instance maybe alive - aborting remove for sid "DBNAME1"
Finally, after performing listed actions, I was able to restart the database and its restore/recovery without any problems.