After upgrading RAC database to 10204 and applying CRS bundle patch-1 for 10204 crs home,
srvctl command fails to startup resources on rac nodes. While starting up RAC resources using SRVCTL
following error occurs in CRSD.log file:
$ srvctl start instance -d rac -i rac2
2009-04-09 13:45:22.091: [ CRSRES][2611477408][ALERT]0`ora...inst` on member `` has experienced an unrecoverable failure. 2009-04-09 13:45:22.091: [ CRSRES][2611477408]0Human intervention required to resume its availability. 2009-04-09 13:46:25.162: [ CRSRES][2611477408]0StopResource: setting CLI values 2009-04-09 13:46:25.174: [ CRSRES][2611477408]0Attempting to stop `ora...inst` on member `` 2009-04-09 13:46:25.206: [ CRSAPP][2611477408]0StopResource error for ora...inst error code = 1
To debug SRVCTL SRVM_TRACE is set to true and a Strace is taken at OS level:
$script /tmp/srvm.log $export SRVM_TRACE=TRUE $srvctl start instance -d -i $exit
It will genertae a trace file at /tmp/srvm.log.
$ strace -aef -o /tmp/strace.log srvctl start instance -d -i
It will generate a trace file at /tmp/strace.log
— srvm.log shows follwoing error:
[Thread-2] [11:57:59:774] [StreamReader.run:65] OUTPUT>Attempting to start `ora.rac.rac2.inst` on member `node11` [Thread-2] [11:58:0:862] [StreamReader.run:65] OUTPUT>`ora.rac.rac2.inst` on member `node11` has experienced an unrecoverable failure. [Thread-2] [11:58:0:862] [StreamReader.run:65] OUTPUT>Human intervention required to resume its availability. [Thread-2] [11:58:0:863] [StreamReader.run:65] OUTPUT>nloz11:ora.rac.rac2.inst:/oac/app/oracle/product/10.2.0/db_1/bin/racgwrap: line 62: fg: no job control [Thread-3] [11:58:0:865] [StreamReader.run:65] ERROR>CRS-0215: Resource ora.rac.rac2.inst cannot be started. [Thread-3] [11:58:0:865] [StreamReader.run:65] ERROR> [Worker 0] [11:58:0:865] [RuntimeExec.runCommand:133] runCommand: process returns 115
— strace.log file shows the following:
rt_sigprocmask(SIG_SETMASK, [], NULL, 8 ) = 0 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8 ) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8 ) = 0 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 rt_sigaction(SIGINT, {0x8075d8b, [], SA_RESTORER, 0xb7ee5908}, {SIG_IGN}, 8 ) = 0 waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 2}], 0) = 18699 rt_sigprocmask(SIG_SETMASK, [], NULL, 8 ) = 0 --- SIGCHLD (Child exited) @ 0 (0) --- waitpid(-1, 0xbfffe9bc, WNOHANG) = -1 ECHILD (No child processes) sigreturn() = ? (mask now []) rt_sigaction(SIGINT, {SIG_IGN}, {0x8075d8b, [], SA_RESTORER, 0xb7ee5908}, 8 ) = 0 rt_sigprocmask(SIG_BLOCK, NULL, [], 8 ) = 0 read(255, "exit $?\n", 6261) = 8 rt_sigprocmask(SIG_SETMASK, [], NULL, 8 ) = 0 exit_group(2) = ?
The SRVM trace showed that there is a problem with racgwrap script at line 62 which indicates the following:
$ORACLE_HOME/bin/racgmain “$@”
Could not found much with this line, but from the begning i.e line 1 the entry for ORACLE_HOME was missing.
ORACLE_HOME=<%ORACLE_HOME%>
export ORACLE_HOME
— Added the correct oracle_home location at this place.
Also, after checking the srvctl file for the db_home the “OHOME” and “CHOME” entries were missing:
— Added the correct entries for OHOME and CHOME ( copied the entries from the node where srvctl was working fine)
After making these two changes SRVCTL worked fine.
Cheers!!!!
Saurabh Sood
Nice post.
Thanks