It is very common where a DBA is left with corrupted OCR disk without having any good backup.
The same situation was experienced by me few days back. One node of RAC database shows the following:
NODE1:
<span style="font-family: arial,helvetica,sans-serif;"><strong>$ORA_CRS_HOME/bin/crs_stat -t </strong>Name Type Target State Host ------------------------------------------------------------ ora.orcl.db application ONLINE ONLINE rac1 ora....11.inst application ONLINE ONLINE rac1 ora....12.inst application ONLINE OFFLINE ora....vice.cs application OFFLINE OFFLINE ora....l11.srv application ONLINE OFFLINE ora....l12.srv application ONLINE OFFLINE ora....SM1.asm application ONLINE ONLINE rac1 ora....DC.lsnr application ONLINE ONLINE rac1 ora....abc.gsd application ONLINE ONLINE rac1 ora....abc.ons application ONLINE ONLINE rac1 ora....abc.vip application ONLINE ONLINE rac1 ora....SM2.asm application ONLINE ONLINE rac2 ora....C2.lsnr application ONLINE ONLINE rac2 ora....bc2.gsd application ONLINE ONLINE rac2 ora....bc2.ons application ONLINE ONLINE rac2 ora....bc2.vip application ONLINE ONLINE rac2</span>
The other node shows the following:
NODE2:
<span style="font-family: arial,helvetica,sans-serif;"><strong>/crs_stat -t</strong> HA Resource Target State ----------- ------ ----- ora.orcl.db OFFLINE OFFLINE ora.orcl.orcl11.inst OFFLINE OFFLINE ora.orcl.orcl12.inst OFFLINE OFFLINE ora.orcl.test_service.cs ONLINE OFFLINE ora.orcl.test_service.orcl11.srv OFFLINE OFFLINE ora.orcl.test_service.orcl12.srv OFFLINE OFFLINE ora.rac1 .ASM1.asm OFFLINE OFFLINE ora.rac1 .LISTENER_RAC1 .lsnr OFFLINE OFFLINE ora.rac1 .gsd OFFLINE OFFLINE ora.rac1 .ons OFFLINE OFFLINE ora.rac1 .vip OFFLINE OFFLINE ora.rac2.ASM2.asm OFFLINE OFFLINE ora.rac2.LISTENER_RAC2 2.lsnr ONLINE OFFLINE ora.rac2.gsd ONLINE OFFLINE ora.rac2.ons ONLINE OFFLINE ora.rac2.vip ONLINE OFFLINE</span>
We can see the inconsistent data across two node RAC. Every command for srvctl, crsctl was hanging on NODE 2.
Now the option is to restore the OCR backup, but if there is no backup available for OCR then we can use the following procedure to recover from corrupted OCR disk
(There will be complete downtime needed to perform these operations)
1. Check the status of CRS from node 1:
# ps -eaf |grep d.bin
root 12873 1 0 Aug11 ? 00:11:07 /u01/app/crs/bin/crsd.bin reboot
oracle 13105 12846 0 Aug11 ? 00:00:45 /u01/app/crs/bin/evmd.bin
oracle 13226 13200 0 Aug11 ? 00:13:13 /u01/app/crs/bin/ocssd.bin
root 21458 19986 0 20:34 pts/4 00:00:00 grep d.bin
2. Shutdown Oracle ClusterWare on all nodes:
<span style="font-family: arial,helvetica,sans-serif;">[root@rac1 bin]# ./crsctl stop crs Stopping resources. Successfully stopped CRS resources Stopping CSSD. Shutting down CSS daemon. Shutdown request successfully issued.</span>
Check the status again:
[root@rac1 bin]# ps -eaf |grep d.bin
root 21927 19986 0 20:34 pts/4 00:00:00 grep d.bin
It shows that the cluster is stopped.
3. Execute rootdelete.sh from all nodes.
It is under directory $ORA_CRS_HOME/install/rootdelete.sh
NODE1:
<span style="font-family: arial,helvetica,sans-serif;">[root@rac1 install]# <strong>./rootdelete.sh</strong> Shutting down Oracle Cluster Ready Services (CRS): Stopping resources. Error while stopping resources. Possible cause: CRSD is down. Stopping CSSD. Unable to communicate with the CSS daemon. Shutdown has begun. The daemons should exit soon. Checking to see if Oracle CRS stack is down... Oracle CRS stack is not running. Oracle CRS stack is down now. Removing script for Oracle Cluster Ready services Updating ocr file for downgrade Cleaning up SCR settings in '/etc/oracle/scls_scr'</span>
NODE 2:
./rootdelete.sh</strong> Shutting down Oracle Cluster Ready Services (CRS): OCR initialization failed accessing OCR device: PROC-26: Error while accessing the physical storage Operating System error [No such file or directory] [2] Shutdown has begun. The daemons should exit soon. Checking to see if Oracle CRS stack is down... Oracle CRS stack is not running. Oracle CRS stack is down now. Removing script for Oracle Cluster Ready services Updating ocr file for downgrade Cleaning up SCR settings in '/etc/oracle/scls_scr'</span>
“OCR initialization failed accessing OCR device”, this error can occur due to folloing reasons:
1. ocrconfig_loc is not pointing to the correct ocr.
2. Problem of rights and owners on the ocr devices
3. Configuration problem on Oracle Cluster Synchronization Services
As the SCR entries are cleaned up so there is no need to worry about PROC-26 error.
If you have more than 2 nodes in a rac you need to run rootdelete.sh on all the other nodes also.
4. Run rootdeinstall.sh from the node where the RAC installation was done (usually it is the node1).
It will clear up the OCR disk contents.
<span style="font-family: arial,helvetica,sans-serif;">./rootdeinstall.sh</span> <span style="font-family: arial,helvetica,sans-serif;">Removing contents from OCR device 2560+0 records in 2560+0 records out</span>
5. Run root.sh from the same node:
<span style="font-family: arial,helvetica,sans-serif;">./root.sh WARNING: directory '/u01' is not owned by root Checking to see if Oracle CRS stack is already configured</span> <span style="font-family: arial,helvetica,sans-serif;">Setting the permissions on OCR backup directory Setting up NS directories Oracle Cluster Registry configuration upgraded successfully WARNING: directory '/u01' is not owned by root assigning default hostname rac1 for node 1. assigning default hostname rac2 2 for node 2. Successfully accumulated necessary OCR keys. Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897. node :</span> node 1: rac1 rac1-priv rac1 node 2: rac2 rac2-priv rac2 Creating OCR keys for user 'root', privgrp 'root'.. Operation successful. Now formatting voting device: /dev/raw/raw1 Format of 1 voting devices complete. Startup will be queued to init within 90 seconds. Adding daemons to inittab Expecting the CRS daemons to be up within 600 seconds. CSS is active on these nodes. rac1 CSS is inactive on these nodes. rac2 2 Local node checking complete. Run root.sh on remaining nodes to start CRS daemons.
After its completion run root.sh on all remaining nodes.
<span style="font-family: arial,helvetica,sans-serif;"> ./root.sh Checking to see if Oracle CRS stack is already configured</span> <span style="font-family: arial,helvetica,sans-serif;">Setting the permissions on OCR backup directory Setting up NS directories Oracle Cluster Registry configuration upgraded successfully clscfg: EXISTING configuration version 3 detected. clscfg: version 3 is 10G Release 2. assigning default hostname rac1 for node 1. assigning default hostname rac2 for node 2. Successfully accumulated necessary OCR keys. Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897. node :</span> node 1: rac1 rac1-priv rac1 node 2: rac2 rac2-priv rac2 clscfg: Arguments check out successfully. <span style="font-family: arial,helvetica,sans-serif;">NO KEYS WERE WRITTEN. Supply -force parameter to override. -force is destructive and will destroy any previous cluster configuration. Oracle Cluster Registry for cluster has already been initialized Startup will be queued to init within 90 seconds. Adding daemons to inittab Expecting the CRS daemons to be up within 600 seconds. CSS is active on these nodes. rac1 rac2 CSS is active on all nodes. Oracle CRS stack installed and running under init(1M) Running vipca(silent) for configuring nodeapps The given interface(s), "eth0" is not public. Public interfaces should be used to configure virtual IPs.</span>
The silent mode VIPCA configuration will fail because of BUG 4437727 in 10.2.0.1. To solve this run the
VIPCA manually from root user from last node where this error has occured and follow the instructions.
# $ORA_CRS_HOME/bin/vipca
6. Now final step is to add the resources back to OCR with srvctl command.
Adding DATABASE to OCR:
$srvctl add database -d db_unique_name -o oracle_home [oracle@rac1 ~]$ $ORA_CRS_HOME/bin/srvctl add database -d orcl -o /u01/app/oracle/product/10.2.0/db_1</span>
Adding INSTANCE to OCR:
srvctl add instance -d db_unique_name -i inst_name -n node_name [oracle@rac1 ~]$ $ORA_CRS_HOME/bin/srvctl add instance -d orcl -i orcl11 -n rac1 [oracle@rac1 ~]$ $ORA_CRS_HOME/bin/srvctl add instance -d orcl -i orcl12 -n rac2 2</span>
Adding SERVICES to OCR:
$srvctl add service -d db_unique_name -s service_name -r preferred_list [oracle@rac1 ~]$ $ORA_CRS_HOME/bin/srvctl add service -d orcl -s test_service -r orcl11,orcl12</span>
Adding NODEAPPS to OCR:
srvctl add nodeapps -n node_name -o oracle_home -A addr_str
Where addr_str= The node level VIP address
This command needs to be run from ROOT user otherwise you will get following error:
<span style="font-family: arial,helvetica,sans-serif;">[oracle@rac1 ~]$ $ORA_CRS_HOME/bin/srvctl add nodeapps -n rac1 -o /u01/app/oracle/product/10.2.0/db_1 -A 10.167.21.89/255.255.255.0 PRKO-2117 : This command should be executed as the system privilege user. [oracle@rac1 ~]$ [oracle@rac1 ~]$ su - Password: [root@rac1 ~]# cd /u01/app/crs/bin [root@rac1 bin]# ./srvctl add nodeapps -n rac1 -o /u01/app/oracle/product/10.2.0/db_1 -A 10.167.21.87/255.255.255.0 [root@rac1 bin]#./srvctl add nodeapps -n rac2 2 -o /u01/app/oracle/product/10.2.0/db_1 -A 10.167.21.89/255.255.255.0</span>
This will complete the OCR recreation, now you can test the status with cluvfy.
Recent Comments