CRS-4640 Error on Starting 11gR2 clusterware

I was working on a issue where in Clusterware was not coming up as private interface was down. Following errors were recorded in ocssd.log which informed that issue was with private interface

2011-08-31 15:03:38.051: [ CSSD][1090451776]clssnmvDHBValidateNCopy: node 2, testrac2, has a disk HB, but no network HB, DHB has rcfg 205815745, wrtcnt, 4418998, LATS 4634324, lastSeqNo 4418997, uniqueness 1314797539, timestamp 1314803017/4632384

Checking status of crs informed that the OHASD process was up and running but CRS,CSSD and EVMD processes were not running.

[root@testrac1 cssd]# /oragrid/product/11.2/bin/crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager

After fixing the interface issue, we tried starting CRS with ‘crsctl start crs‘ command and it failed with following errors

[root@testrac1 cssd]# /oragrid/product/11.2/bin/crsctl start crs
CRS-4640: Oracle High Availability Services is already active
CRS-4000: Command Start failed, or completed with errors.

CRS-4640 is reported since OHASD is already running. In 11.2 OHASD is supposed to start the other dependent processes.

crsctl stop crs command failed

[root@testrac1 cssd]# /oragrid/product/11.2/bin/crsctl stop crs
CRS-2796: The command may not proceed when Cluster Ready Services is not running
CRS-4687: Shutdown command has completed with errors.
CRS-4000: Command Stop failed, or completed with errors.

Since ohasd was already running, I tried crsctl start cluster (this command requires ohasd to be up), and this command succeeded

[root@testrac1 cssd]# /oragrid/product/11.2/bin/crsctl start cluster
CRS-2672: Attempting to start 'ora.cssd' on 'testrac1'
CRS-2676: Start of 'ora.cssd' on 'testrac1' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'testrac1'
CRS-2672: Attempting to start 'ora.ctssd' on 'testrac1'
CRS-2676: Start of 'ora.ctssd' on 'testrac1' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'testrac1'
CRS-2672: Attempting to start 'ora.evmd' on 'testrac1'
CRS-2676: Start of 'ora.crsd' on 'testrac1' succeeded
CRS-5702: Resource 'ora.crsd' is already running on 'testrac1'
CRS-2676: Start of 'ora.evmd' on 'testrac1' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'testrac1' succeeded
CRS-5702: Resource 'ora.cluster_interconnect.haip' is already running on 'testrac1'
CRS-4000: Command Start failed, or completed with errors.

[root@testrac1 ~]# /oragrid/product/11.2/bin/crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

Ideally crsctl start crs should be used to start the Clusterware components. But in case they fail to come up due to some issue (e.g voting disk inaccessible,interface issue) and you are in situation when ohasd is up then you can use crsctl start cluster to start the remaining clusterware processes after fixing underlying issue.I believe crsctl stop crs -f option can also be used, though I didn’t try it for this issue.