CRS Fails to Start – 10.2.0.1 RAC Install on AIX

I was installing 10.2.0.1 on IBM AIX 5L and while running root.sh from first node (as part of Clusterware installation) got following messages

Now formatting voting device: /dev/voting_disk01
Now formatting voting device: /dev/voting_disk02
Now formatting voting device: /dev/voting_disk03
Format of 3 voting devices complete.
Startup will be queued to init within 30 seconds

I waited for quite some time and found that it was stuck. To check what was status of CSS, I did a grep for CSS and found that it was running /etc/init.cssd startcheck css script. This indicated that Oracle was stuck trying to start CSS. Following errors were recorded in /tmp/crsct.7459

Failure in CSS initialization opening OCR.

Metalink notes suggested checking OCR Disk permission , though in my case they had correct permissions i.e ownership as oracle:dba and permission set to 660. To diagnose further, I checked $ORA_CRS_HOME/log to check for errors. All the logfiles related to CRS,CSS and EVMD are stored in $ORA_CRS_HOME/log/<hostname>.

/oracle/crs_base/app/product/crs10gR2/log>ls -ltr
total 0
drwxrwx---    2 oracle   dba             256 Jan 28 18:46 crs
drwx------    3 root     system          256 Jan 28 18:53 chd0196
drwxr-xr-t    8 root     dba             256 Jan 28 18:53 rac01

Hostname for the server was rac01 and not chd0196. This was a new server and also directories could not be present earlier as it was a fresh installation.  Oracle was picking two hostname which was quite strange. I checked for HACMP filesets  and found that they were present

# lslpp -l |grep -i hacmp
  rsct.basic.hacmp           2.4.9.0  COMMITTED  RSCT Basic Function (HACMP/ES
  rsct.compat.basic.hacmp    2.4.9.0  COMMITTED  RSCT Event Management Basic
                                                 Function (HACMP/ES Support)
  rsct.compat.clients.hacmp  2.4.9.0  COMMITTED  RSCT Event Management Client
                                                 Function (HACMP/ES Support)

10g RAC does not require Vendor clusterware as Oracle provides it own clusterware called “Oracle Clusterware”.We got these packages un-installed and got both server rebooted. After cleaning up RAC installation, we restarted installation . You can use  Metalink Note 239998.1 – 10g RAC: How to Clean Up After a Failed CRS Install for cleanup.  On re-running root.sh installation, installation went fine.