I was working on a issue where in Clusterware was not coming up as private interface was down. Following errors were recorded in ocssd.log which informed that issue was with private interface
2011-08-31 15:03:38.051: [ CSSD][1090451776]clssnmvDHBValidateNCopy: node 2, testrac2, has a disk HB, but no network HB, DHB has rcfg 205815745, wrtcnt, 4418998, LATS 4634324, lastSeqNo 4418997, uniqueness 1314797539, timestamp 1314803017/4632384
Checking status of crs informed that the OHASD process was up and running but CRS,CSSD and EVMD processes were not running.
[root@testrac1 cssd]# /oragrid/product/11.2/bin/crsctl check crs CRS-4638: Oracle High Availability Services is online CRS-4535: Cannot communicate with Cluster Ready Services CRS-4530: Communications failure contacting Cluster Synchronization Services daemon CRS-4534: Cannot communicate with Event Manager
After fixing the interface issue, we tried starting CRS with ‘crsctl start crs‘ command and it failed with following errors
[root@testrac1 cssd]# /oragrid/product/11.2/bin/crsctl start crs CRS-4640: Oracle High Availability Services is already active CRS-4000: Command Start failed, or completed with errors.
CRS-4640 is reported since OHASD is already running. In 11.2 OHASD is supposed to start the other dependent processes.
crsctl stop crs command failed
[root@testrac1 cssd]# /oragrid/product/11.2/bin/crsctl stop crs CRS-2796: The command may not proceed when Cluster Ready Services is not running CRS-4687: Shutdown command has completed with errors. CRS-4000: Command Stop failed, or completed with errors.
Since ohasd was already running, I tried crsctl start cluster (this command requires ohasd to be up), and this command succeeded
[root@testrac1 cssd]# /oragrid/product/11.2/bin/crsctl start cluster CRS-2672: Attempting to start 'ora.cssd' on 'testrac1' CRS-2676: Start of 'ora.cssd' on 'testrac1' succeeded CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'testrac1' CRS-2672: Attempting to start 'ora.ctssd' on 'testrac1' CRS-2676: Start of 'ora.ctssd' on 'testrac1' succeeded CRS-2672: Attempting to start 'ora.crsd' on 'testrac1' CRS-2672: Attempting to start 'ora.evmd' on 'testrac1' CRS-2676: Start of 'ora.crsd' on 'testrac1' succeeded CRS-5702: Resource 'ora.crsd' is already running on 'testrac1' CRS-2676: Start of 'ora.evmd' on 'testrac1' succeeded CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'testrac1' succeeded CRS-5702: Resource 'ora.cluster_interconnect.haip' is already running on 'testrac1' CRS-4000: Command Start failed, or completed with errors. [root@testrac1 ~]# /oragrid/product/11.2/bin/crsctl check crs CRS-4638: Oracle High Availability Services is online CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online
Ideally crsctl start crs should be used to start the Clusterware components. But in case they fail to come up due to some issue (e.g voting disk inaccessible,interface issue) and you are in situation when ohasd is up then you can use crsctl start cluster to start the remaining clusterware processes after fixing underlying issue.I believe crsctl stop crs -f option can also be used, though I didn’t try it for this issue.
Great sharing of this experience……
Thanks ! Just got us out of a hole