SRVCTL fails to start RAC resources:CRS-0215

After upgrading RAC database to 10204 and applying CRS bundle patch-1 for 10204 crs home,
srvctl command fails to startup resources on rac nodes. While starting up RAC resources using SRVCTL
following error occurs in CRSD.log file:

$ srvctl start instance -d rac -i rac2

2009-04-09 13:45:22.091: [  CRSRES][2611477408][ALERT]0`ora...inst` on member `` has experienced an unrecoverable failure.
2009-04-09 13:45:22.091: [  CRSRES][2611477408]0Human intervention required to resume its availability.
2009-04-09 13:46:25.162: [  CRSRES][2611477408]0StopResource: setting CLI values
2009-04-09 13:46:25.174: [  CRSRES][2611477408]0Attempting to stop `ora...inst` on member ``
2009-04-09 13:46:25.206: [  CRSAPP][2611477408]0StopResource error for ora...inst error code = 1

To debug SRVCTL SRVM_TRACE is set to true and a Strace is taken at OS level:

$script /tmp/srvm.log
$export SRVM_TRACE=TRUE
$srvctl start instance -d  -i
$exit

It will genertae a trace file at /tmp/srvm.log.

$ strace -aef -o /tmp/strace.log srvctl start instance -d -i

It will generate a trace file at /tmp/strace.log

— srvm.log shows follwoing error:

[Thread-2] [11:57:59:774] [StreamReader.run:65]  OUTPUT>Attempting to start `ora.rac.rac2.inst` on member `node11`
[Thread-2] [11:58:0:862] [StreamReader.run:65]  OUTPUT>`ora.rac.rac2.inst` on member `node11` has experienced an unrecoverable failure.
[Thread-2] [11:58:0:862] [StreamReader.run:65]  OUTPUT>Human intervention required to resume its availability.
[Thread-2] [11:58:0:863] [StreamReader.run:65]  OUTPUT>nloz11:ora.rac.rac2.inst:/oac/app/oracle/product/10.2.0/db_1/bin/racgwrap: line 62: fg: no job control
[Thread-3] [11:58:0:865] [StreamReader.run:65]  ERROR>CRS-0215: Resource ora.rac.rac2.inst cannot be started.
[Thread-3] [11:58:0:865] [StreamReader.run:65]  ERROR>
[Worker 0] [11:58:0:865] [RuntimeExec.runCommand:133]  runCommand: process returns 115

— strace.log file shows the following:

rt_sigprocmask(SIG_SETMASK, [], NULL, 8 ) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8 ) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8 ) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGINT, {0x8075d8b, [], SA_RESTORER, 0xb7ee5908}, {SIG_IGN}, 8 ) = 0
waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 2}], 0) = 18699
rt_sigprocmask(SIG_SETMASK, [], NULL, 8 ) = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
waitpid(-1, 0xbfffe9bc, WNOHANG) = -1 ECHILD (No child processes)
sigreturn() = ? (mask now [])
rt_sigaction(SIGINT, {SIG_IGN}, {0x8075d8b, [], SA_RESTORER, 0xb7ee5908}, 8 ) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8 ) = 0
read(255, "exit $?\n", 6261) = 8
rt_sigprocmask(SIG_SETMASK, [], NULL, 8 ) = 0
exit_group(2) = ?

The SRVM trace showed that there is a problem with racgwrap script at line 62 which indicates the following:

$ORACLE_HOME/bin/racgmain “$@”

Could not found much with this line, but from the begning i.e line 1 the entry for ORACLE_HOME was missing.

ORACLE_HOME=<%ORACLE_HOME%>
export ORACLE_HOME
— Added the correct oracle_home location at this place.

Also, after checking the srvctl file for the db_home the “OHOME” and “CHOME” entries were missing:
— Added the correct entries for OHOME and CHOME ( copied the entries from the node where srvctl was working fine)

After making these two changes SRVCTL worked fine.

Cheers!!!!
Saurabh Sood