srvctl

SRVCTL fails to start RAC resources:CRS-0215

After upgrading RAC database to 10204 and applying CRS bundle patch-1 for 10204 crs home,
srvctl command fails to startup resources on rac nodes. While starting up RAC resources using SRVCTL
following error occurs in CRSD.log file:

$ srvctl start instance -d rac -i rac2

2009-04-09 13:45:22.091: [  CRSRES][2611477408][ALERT]0`ora...inst` on member `` has experienced an unrecoverable failure.
2009-04-09 13:45:22.091: [  CRSRES][2611477408]0Human intervention required to resume its availability.
2009-04-09 13:46:25.162: [  CRSRES][2611477408]0StopResource: setting CLI values
2009-04-09 13:46:25.174: [  CRSRES][2611477408]0Attempting to stop `ora...inst` on member ``
2009-04-09 13:46:25.206: [  CRSAPP][2611477408]0StopResource error for ora...inst error code = 1

To debug SRVCTL SRVM_TRACE is set to true and a Strace is taken at OS level:

$script /tmp/srvm.log
$export SRVM_TRACE=TRUE
$srvctl start instance -d  -i
$exit

It will genertae a trace file at /tmp/srvm.log.

$ strace -aef -o /tmp/strace.log srvctl start instance -d -i

It will generate a trace file at /tmp/strace.log

— srvm.log shows follwoing error:

[Thread-2] [11:57:59:774] [StreamReader.run:65]  OUTPUT>Attempting to start `ora.rac.rac2.inst` on member `node11`
[Thread-2] [11:58:0:862] [StreamReader.run:65]  OUTPUT>`ora.rac.rac2.inst` on member `node11` has experienced an unrecoverable failure.
[Thread-2] [11:58:0:862] [StreamReader.run:65]  OUTPUT>Human intervention required to resume its availability.
[Thread-2] [11:58:0:863] [StreamReader.run:65]  OUTPUT>nloz11:ora.rac.rac2.inst:/oac/app/oracle/product/10.2.0/db_1/bin/racgwrap: line 62: fg: no job control
[Thread-3] [11:58:0:865] [StreamReader.run:65]  ERROR>CRS-0215: Resource ora.rac.rac2.inst cannot be started.
[Thread-3] [11:58:0:865] [StreamReader.run:65]  ERROR>
[Worker 0] [11:58:0:865] [RuntimeExec.runCommand:133]  runCommand: process returns 115

— strace.log file shows the following:

rt_sigprocmask(SIG_SETMASK, [], NULL, 8 ) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8 ) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8 ) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGINT, {0x8075d8b, [], SA_RESTORER, 0xb7ee5908}, {SIG_IGN}, 8 ) = 0
waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 2}], 0) = 18699
rt_sigprocmask(SIG_SETMASK, [], NULL, 8 ) = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
waitpid(-1, 0xbfffe9bc, WNOHANG) = -1 ECHILD (No child processes)
sigreturn() = ? (mask now [])
rt_sigaction(SIGINT, {SIG_IGN}, {0x8075d8b, [], SA_RESTORER, 0xb7ee5908}, 8 ) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8 ) = 0
read(255, "exit $?\n", 6261) = 8
rt_sigprocmask(SIG_SETMASK, [], NULL, 8 ) = 0
exit_group(2) = ?

The SRVM trace showed that there is a problem with racgwrap script at line 62 which indicates the following:

$ORACLE_HOME/bin/racgmain “$@”

Could not found much with this line, but from the begning i.e line 1 the entry for ORACLE_HOME was missing.

ORACLE_HOME=<%ORACLE_HOME%>
export ORACLE_HOME
— Added the correct oracle_home location at this place.

Also, after checking the srvctl file for the db_home the “OHOME” and “CHOME” entries were missing:
— Added the correct entries for OHOME and CHOME ( copied the entries from the node where srvctl was working fine)

After making these two changes SRVCTL worked fine.

Cheers!!!!
Saurabh Sood

UNKNOWN State Of RAC Resources

While Checking the status of database resources, ASM was shown as UNKNOWN on one node of a two node RAC.

$ crs_stat -t

Name           Type           Target    State     Host
------------------------------------------------------------
ora.orcl.db    application    ONLINE    ONLINE    rac1
ora....11.inst application    ONLINE    ONLINE    rac1
ora....SM1.asm application    ONLINE    ONLINE    rac1
ora....DC.lsnr application    ONLINE    ONLINE    rac1
ora....idc.gsd application    ONLINE    ONLINE    rac1
ora....idc.ons application    ONLINE    ONLINE    rac1
ora....idc.vip application    ONLINE    ONLINE    rac1
ora....SM2.asm application    ONLINE    UNKNOWN    rac2
ora....C2.lsnr application    ONLINE    ONLINE    rac2
ora....dc2.gsd application    ONLINE    ONLINE    rac2
ora....dc2.ons application    ONLINE    ONLINE    rac2
ora....dc2.vip application    ONLINE    ONLINE    rac2

Following error was coming while trying to start the +ASM2 instance with SRVCTL:

$srvctl start asm -n rac2

PRKS-1009 : Failed to start ASM instance "+ASM2" on node "rac2",
[CRS-0223: Resource 'ora.rac2.ASM2.asm' has placement error.]

While trying to start the same with crs_start :

$ crs_start -f ora.rac2.ASM2.asm

CRS-1028: Dependency analysis failed because of:
'Resource in UNKNOWN state: ora.rac2.ASM2.asm'
CRS-0223: Resource 'ora.rac2.ASM2.asm' has placement error

There are two ways to come out of this UNKNOWN state of resources:
1. Start the resource from sqlplus
2. Use crs_stop -f to clear the state of database resources.

$ export ORACLE_HOME=+ASM2
$ sqlplus "/ as sysdba"
SQL>startup 
Diskgroup mounted

It will go fine and the +ASM2 instnace will be started.

$ crs_stop -f ora.rac2.ASM2.asm

This will clear the UNKNOWN state and will make the resource as OFFLINE.

Now start the resource as:

$ srvctl start asm -n rac2

After using this check the status :

$ crs_stat -t

In case of listener resource, if starting listener using srvctl results in following error

CRS-0215: Could not start resource 'ora.dev-101.LISTENER_DEV-101.lsnr'.

This can be resolved by removing listener resource and adding it back. Perform following action using root user

#crs_unregister ora.dev-101.LISTENER_DEV-101.lsnr
#crs_unregister ora.dev-102.LISTENER_DEV-102.lsnr

Then recreate the listener using silent mode as oracle user

$netca /silent /responsefile $ORACLE_HOME/network/install/netca_typ.rsp /nodeinfo dev-101,dev-102

Above command can result in error like below

Exception in thread "main" java.lang.UnsatisfiedLinkError: /home/oracle/product/10.2/jdk/jre/lib/i386/libawt.so: libXp.so.6: cannot open shared object file: No such file or directory
	at java.lang.ClassLoader$NativeLibrary.load(Native Method)
	at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1586)
	at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1503)
	at java.lang.Runtime.loadLibrary0(Runtime.java:788)
	at java.lang.System.loadLibrary(System.java:834)
	at sun.security.action.LoadLibraryAction.run(LoadLibraryAction.java:50)
	at java.security.AccessController.doPrivileged(Native Method)
	at sun.awt.NativeLibLoader.loadLibraries(NativeLibLoader.java:38)
	at sun.awt.DebugHelper.(DebugHelper.java:29)
	at java.awt.Component.(Component.java:506)

This can be resolved by installing xorg-x11-deprecated-libs rpm. (yum install xorg-x11-deprecated-libs)