RAC

MGMTDB: Grid Infrastructure Management Repository

MGMTDB is new database instance which is used for storing Cluster Health Monitor (CHM) data. In 11g this was being stored in berkley database but starting Oracle database 12c it is configured as  Oracle Database Instance.
In 11g, .bdb files were stored in $GRID_HOME/crf/db/hostname and used to take up lot of space (>100G) due to bug in 11.2.0.2

During 12c Grid infrastructure installation, there is option to configure Grid Infrastructure Management Repository.

grid_management_db

If you choose YES, then you will see instance -MGMTDB running on one of the node on your cluster.

[oracle@oradbdev02]~% ps -ef|grep mdb_pmon
oracle    7580     1  0 04:57 ?        00:00:00 mdb_pmon_-MGMTDB

This is a Oracle single instance which is being managed by Grid Infrastructure and fails over to surviving node if existing node crashes.You can identify the current master using below command

-bash-4.1$ oclumon manage -get MASTER

Master = oradbdev02

This DB instance can be managed using srvctl commands. Current master can also be identified using status command

$srvctl status mgmtdb 
Database is enabled
Instance -MGMTDB is running on node oradbdev02

We can look at mgmtdb config using

$srvctl config mgmtdb
Database unique name: _mgmtdb
Database name: 
Oracle home: /home/oragrid
Oracle user: oracle
Spfile: +VDISK/_mgmtdb/spfile-MGMTDB.ora
Password file: 
Domain: 
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Database instance: -MGMTDB
Type: Management

Replace config with start/stop to start/stop database.
Databases files for repository database are stored in same location as OCR/Voting disk

SQL> select file_name from dba_data_files union select member file_name from V$logfile;

FILE_NAME
------------------------------------------------------------
+VDISK/_MGMTDB/DATAFILE/sysaux.258.819384615
+VDISK/_MGMTDB/DATAFILE/sysgridhomedata.261.819384761
+VDISK/_MGMTDB/DATAFILE/sysmgmtdata.260.819384687
+VDISK/_MGMTDB/DATAFILE/system.259.819384641
+VDISK/_MGMTDB/DATAFILE/undotbs1.257.819384613
+VDISK/_MGMTDB/ONLINELOG/group_1.263.819384803
+VDISK/_MGMTDB/ONLINELOG/group_2.264.819384805
+VDISK/_MGMTDB/ONLINELOG/group_3.265.819384807

We can verify the same using oclumon command

-bash-4.1$ oclumon manage -get reppath

CHM Repository Path = +VDISK/_MGMTDB/DATAFILE/sysmgmtdata.260.819384687

Since this is stored at same location as Voting disk, if you have opted for configuring Management database, you will need to use voting disk with size >5G (3.2G+ is being used by MGMTDB). During GI Installation ,I had tried adding voting disk of 2G but it failed saying that it is of insufficient size. Error didnot indicate that its needed for Management repository but now I think this is because of repository sharing same location as OCR/Voting disk.
Default (also Minimum) size for CHM repository is 2048 M . We can increase respository size by issuing following command

-bash-4.1$ oclumon manage -repos changerepossize 4000
The Cluster Health Monitor repository was successfully resized.The new retention is 266160 seconds.

This command internally runs resize command on datafile and we can see that it changed datafile size from 2G to 4G

SQL> select file_name,bytes/1024/1024,maxbytes/1024/1024,autoextensible from dba_data_files;

FILE_NAME					   BYTES/1024/1024 MAXBYTES/1024/1024 AUT
-------------------------------------------------- --------------- ------------------ ---
+VDISK/_MGMTDB/DATAFILE/sysmgmtdata.260.819384687	      4000		    0 NO

If we try to reduce the size from 4Gb to 3Gb, it will warn and upon user confirmation drop all repository data

-bash-4.1$ oclumon manage -repos changerepossize 3000
Warning: Entire data in Cluster Health Monitor repository will be deleted.Do you want to continue(Yes/No)?
Yes
The Cluster Health Monitor repository was successfully resized.The new retention is 199620 seconds.

Tracefiles for db are stored under DIAG_HOME/_mgmtdb/-MGMTDB/trace. Alert log for instance can be found at this location. Since file name start with -MGMTDB*, we need to use ./ to access files. e.g

[oracle@oradbdev02]~/diag/rdbms/_mgmtdb/-MGMTDB/trace% vi -MGMTDB_mmon_7670.trc
VIM - Vi IMproved 7.2 (2008 Aug 9, compiled Feb 17 2012 10:23:31)
Unknown option argument: "-MGMTDB_mmon_7670.trc"
More info with: "vim -h"
[oracle@oradbdev02]~/diag/rdbms/_mgmtdb/-MGMTDB/trace% vi ./-MGMTDB_mmon_7670.trc

Sample output from a 3 node RAC setup

[oracle@oradbdev02]~% oclumon dumpnodeview -allnodes

----------------------------------------
Node: oradbdev02 Clock: '13-07-23 07.19.00' SerialNo:1707 
----------------------------------------

SYSTEM:
#pcpus: 4 #vcpus: 4 cpuht: N chipname: Dual-Core cpu: 5.15 cpuq: 1 physmemfree: 469504 physmemtotal: 7928104 mcache: 5196464 swapfree: 8191992 swaptotal: 8191992 hugepagetotal: 0 hugepagefree: 0 hugepagesize: 2048 ior: 0 iow: 51 ios: 9 swpin: 0 swpout: 0 pgin: 134 pgout: 140 netr: 223.768 netw: 176.523 procs: 461 rtprocs: 25 #fds: 24704 #sysfdlimit: 779448 #disks: 6 #nics: 3 nicErrors: 0

TOP CONSUMERS:
topcpu: 'oraagent.bin(7090) 2.59' topprivmem: 'java(7247) 149464' topshm: 'ora_mman_snowy1(7783) 380608' topfd: 'ocssd.bin(6249) 273' topthread: 'crsd.bin(6969) 42' 

----------------------------------------
Node: oradbdev03 Clock: '13-07-23 07.19.02' SerialNo:47 
----------------------------------------

SYSTEM:
#pcpus: 4 #vcpus: 4 cpuht: N chipname: Dual-Core cpu: 3.65 cpuq: 2 physmemfree: 1924468 physmemtotal: 7928104 mcache: 4529232 swapfree: 8191992 swaptotal: 8191992 hugepagetotal: 0 hugepagefree: 0 hugepagesize: 2048 ior: 1 iow: 83 ios: 17 swpin: 0 swpout: 0 pgin: 45 pgout: 55 netr: 67.086 netw: 55.042 procs: 373 rtprocs: 22 #fds: 21280 #sysfdlimit: 779448 #disks: 6 #nics: 3 nicErrors: 0

TOP CONSUMERS:
topcpu: 'osysmond.bin(19281) 1.99' topprivmem: 'ocssd.bin(19323) 83528' topshm: 'ora_mman_snowy2(20306) 261508' topfd: 'ocssd.bin(19323) 249' topthread: 'crsd.bin(19617) 40' 

----------------------------------------
Node: oradbdev04 Clock: '13-07-23 07.18.58' SerialNo:1520 
----------------------------------------

SYSTEM:
#pcpus: 4 #vcpus: 4 cpuht: N chipname: Dual-Core cpu: 3.15 cpuq: 1 physmemfree: 1982828 physmemtotal: 7928104 mcache: 4390440 swapfree: 8191992 swaptotal: 8191992 hugepagetotal: 0 hugepagefree: 0 hugepagesize: 2048 ior: 0 iow: 25 ios: 4 swpin: 0 swpout: 0 pgin: 57 pgout: 27 netr: 81.148 netw: 41.761 procs: 355 rtprocs: 24 #fds: 20064 #sysfdlimit: 779450 #disks: 6 #nics: 3 nicErrors: 0

TOP CONSUMERS:
topcpu: 'ocssd.bin(6745) 2.00' topprivmem: 'ocssd.bin(6745) 83408' topshm: 'ora_mman_snowy3(8168) 381768' topfd: 'ocssd.bin(6745) 247' topthread: 'crsd.bin(7202) 40'

You can learn more about oclumon usage by referring to Oclumon Command Reference

I faced error in my setup where I was getting ora-28000 error while using oclumon command. I tried unlocking account and it didn’t succeed.

oclumon dumpnodeview

dumpnodeview: Node name not given. Querying for the local host
CRS-9118-Grid Infrastructure Management Repository connection error 
 ORA-28000: the account is locked

SQL> alter user chm account unlock;

User altered.

dumpnodeview: Node name not given. Querying for the local host
CRS-9118-Grid Infrastructure Management Repository connection error 
 ORA-01017: invalid username/password; logon denied

This issue occurred as post configuration tasks had failed during GI installation. Solution is to run mgmtca from grid home which fixed the issue by unlocking and setting password for users. Wallet was configured for oclumon to be able to access the repository without hard coding password.

[main] [ 2013-07-23 05:32:41.619 UTC ] [Mgmtca.main:102]  Running mgmtca
[main] [ 2013-07-23 05:32:41.651 UTC ] [Mgmtca.execute:192]  Adding internal user1
[main] [ 2013-07-23 05:32:41.653 UTC ] [Mgmtca.execute:194]  Adding internal user2
[main] [ 2013-07-23 05:32:42.028 UTC ] [Mgmtca.isMgmtdbOnCurrentNode:306]  Management DB is running on blr-devdb-003local node is blr-devdb-003
[main] [ 2013-07-23 05:32:42.074 UTC ] [MgmtWallet.createWallet:54]  Wallet created
[main] [ 2013-07-23 05:32:42.084 UTC ] [Mgmtca.execute:213]  MGMTDB Wallet created
[main] [ 2013-07-23 05:32:42.085 UTC ] [Mgmtca.execute:214]  Adding user/passwd to MGMTDB Wallet
[main] [ 2013-07-23 05:32:42.210 UTC ] [MgmtWallet.terminate:122]  Wallet closed
[main] [ 2013-07-23 05:32:42.211 UTC ] [Mgmtca.execute:227]  Unlocking user and setting password in database
[main] [ 2013-07-23 05:32:42.211 UTC ] [Mgmtjdbc.connect:66]  Connection String=jdbc:oracle:oci:@(DESCRIPTION=(ADDRESS=(PROTOCOL=beq)(PROGRAM=/home/oragrid/bin/oracle)(ARGV0=oracle-MGMTDB)(ARGS='(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))')(ENVS='ORACLE_HOME=/home/oragrid,ORACLE_SID=-MGMTDB')))
[main] [ 2013-07-23 05:32:42.823 UTC ] [Mgmtjdbc.connect:72]  Connection Established

These are two internal users being referred above

select username,account_status from dba_users where username like 'CH%';

USERNAME		       ACCOUNT_STATUS
------------------------------ --------------------------------
CHM			       OPEN
CHA			       OPEN

Openfiler on Virtualbox and 12c Oracle Flex ASM

Last week Oracle released 12c database and Oracle blogosphere is bustling with lot of people posting details about new versions and setup.
I too decided to take plunge and setup my own RAC cluster. I had 4 machines with me but no shared storage 🙁
Long back I had done one installation using openfiler (Followed Jeff Hunter’s article on OTN) but in that case we installed openfiler software on base machine. Finally I decided to try installing it on Virtualbox.
I checked openfiler website and found out that they provide templates for VM which meant that installation on VM was supported. (Anyways this is my test setup 🙂 )

This was 64 bit machine, So I downloaded 64 bit rpm Virtualbox software.

When you try to install it, this will fail with following error

rpm -i VirtualBox-4.2-4.2.14_86644_el6-1.x86_64.rpm 
warning: VirtualBox-4.2-4.2.14_86644_el6-1.x86_64.rpm: Header V4 DSA/SHA1 Signature, key ID 98ab5139: NOKEY
error: Failed dependencies:
	libSDL-1.2.so.0()(64bit) is needed by VirtualBox-4.2-4.2.14_86644_el6-1.x86_64

Solution : Install SDL package . You can either install SDL through yum or install same rpm using yum and it will find dependent SDL package and install it for you.

#yum install SDL
# rpm -i VirtualBox-4.2-4.2.14_86644_el6-1.x86_64.rpm 
warning: VirtualBox-4.2-4.2.14_86644_el6-1.x86_64.rpm: Header V4 DSA/SHA1 Signature, key ID 98ab5139: NOKEY

Creating group 'vboxusers'. VM users must be member of that group!

No precompiled module for this kernel found -- trying to build one. Messages
emitted during module compilation will be logged to /var/log/vbox-install.log.

WARNING: Deprecated config file /etc/modprobe.conf, all config files belong into /etc/modprobe.d/.
Stopping VirtualBox kernel modules [  OK  ]
Recompiling VirtualBox kernel modules [  OK  ]
Starting VirtualBox kernel modules [  OK  ]

Add oracle user to vboxusers to allow oracle user to manage VM

usermod -G vboxusers oracle

Once done, you can download openfiler software from http://www.openfiler.com/community/download/. You can get direct link for 64 bit binary  at Sourceforge.net

Let’s Build Virtual Machine for our setup.

1. Start virtualbox GUI by issuing virtualbox on command line
2. Click on New VM and choose OS (Linux) and Version (Red Hat 64 bit, change it based on your OS)
3. Allocate memory to Machine. I opted for 3G
4. Create a virtual drive of 30G and use .VDI as format. Even though usage is less then 8G, openfiler install fails if you create 8G disk.
5. Once done, click finish and click on settings.
6. Choose Network as eth0 and Bridged adapter. We are using single adapter here
7. Modify Boot order and remove floppy.Keep hard disk as first option
8. In System storage, You can add additional hard disk (Say 100G) which will be used to setup ASM devices

You can use screenshots from oracle-base 12c install article . Note that we are using single adapter and Bridged adapter.

8. When you start VM, it will ask you to choose start-up disk. Choose your openfiler.iso image
9. Press enter on boot screen and then click next

openfiler1
10. Choose install language
11. Next choose hard disk which will be used for installing software. Choose 30G disk which we allocated earlier. Uncheck the second disk

openfiler2
12. Next screen is network configuration. This is most important screen. I used static IP configuration , click edit for eth0 and put all required information i.e IP,subnet,gateway.Also add hostname and DNS information here.

openfiler3

13. Next select timezone and set root password. Once done, you will get success message and reboot will be done. If everything is successful, you can find your setup at

https://hostname:446/ (note its https and not http)

If this doesn’t work then look at your ip settings and ensure that your ip is pingable from outside. More troubleshooting for GUI can be done by restarting openfiler and httpd service. If it gives error, you can troubleshoot further

service openfiler restart
service httpd restart

You can configure openfiler ASM volumes by following  Jeff Hunter article
In case you are on RHEL6, udev rules mentioned in above link will not work. Frits Hoogland article will be of help here (http://fritshoogland.wordpress.com/2012/07/23/using-udev-on-rhel-6-ol-6-to-change-disk-permissions-for-asm/)

You should be ready with ASM storage and can proceed with RAC install. There is not much difference in 12c install except that we have new feature called Oracle ASM Flex. I am documenting screenshots for same here

1. Choose Standard Cluster here. If you choose Flex cluster, it would force you to use GNS as option can’t be unchecked.

flexcluster1

2.  Choose Advanced Install

3. When choosing network interface, select ASM and private for private interface

flexcluster2

4. On screen 10, choose Oracle Flex ASM

flexcluster3

 

I did two Flex cluster setup with 3 node RAC and 2 node RAC and it seems to work at both places. Let’s see Flex cluster in action

You can verify if your ASM is enabled to use Flex mode using below command

[oracle@oradbdev02]~% asmcmd showclustermode
ASM cluster : Flex mode enabled

crsctl command can be used to set Flex mode later. Below command will show configuration

oracle@oradbdev02]~% srvctl config asm
ASM home: /home/oragrid
Password file: +VDISK/orapwASM
ASM listener: LISTENER
ASM instance count: 3
Cluster ASM listener: ASMNET1LSNR_ASM

[oracle@oradbdev02]~% srvctl status asm -detail
ASM is running on oradbdev02,oradbdev03,oradbdev04
ASM is enabled.

Lets reduce the ASM to run only on 2 nodes.This will stop ASM on one node

oracle@oradbdev02]~% srvctl modify asm -count 2

[oracle@oradbdev02]~% srvctl status asm -detail 
ASM is running on oradbdev02,oradbdev03
ASM is enabled.

There is no ASM on node oradbdev04 but db is still running

[oracle@oradbdev04]~% ps -ef|grep pmon
oracle    3949     1  0 10:27 ?        00:00:00 ora_pmon_snowy3
oracle   18728     1  0 07:59 ?        00:00:01 apx_pmon_+APX3

If you now try to start ASM on 3rd node, it will give error

[oracle@oradbdev02]~% srvctl config asm
ASM home: /home/oragrid
Password file: +VDISK/orapwASM
ASM listener: LISTENER
ASM instance count: 2
Cluster ASM listener: ASMNET1LSNR_ASM

[oracle@oradbdev02]~% srvctl start asm -n oradbdev04
PRCR-1013 : Failed to start resource ora.asm
PRCR-1064 : Failed to start resource ora.asm on node oradbdev04
CRS-2552: There are no available instances of resource 'ora.asm' to start.

Lets make it count back to 3 and ASM will start now

[oracle@oradbdev02]~% srvctl modify asm -count 3
[oracle@oradbdev02]~% srvctl start asm -n oradbdev04
[oracle@oradbdev02]~% srvctl status asm -detail     
ASM is running on oradbdev02,oradbdev03,oradbdev04
ASM is enabled.

Now need to explore some other new feature

portmap: unrecognized service on RHEL6

Quick note for people using NFS for shared storage on RAC database. Till RHEL5 we had to ensure nfs,nfslock and portmap service has to be running.
These services are required otherwise you will get following errors while mounting database

ORA-00210: cannot open the specified control file
ORA-00202: control file: '/u01/oradata/orcl/control01.ctl'
ORA-27086: unable to lock file - already in use

Mostly this could be auto-enabled on boot by using chkconfig command. While working on similar issue today, I found out that this service is not present in RHEL 6

# service portmap status
portmap: unrecognized service

The portmap service was used to map RPC program numbers to IP address port number combinations in earlier versions of Red Hat Enterprise Linux.
As per RHEL6 docs, portmap service has been replaced by rpcbind in Red Hat Enterprise Linux 6 to enable IPv6 support.
So following command will work

# service rpcbind status
rpcbind (pid  1587) is running...

You can read about NFS and associated processes from RHEL6 docs

11gR2:Listener Startup Issues

In this blog post I will be discussing listener startup issues faced in 11gR2 RAC. I will be constantly updating this post based on my experiences or any comments on this blog post.

Let’s get started.You will experience following errors while starting listener using srvctl

[oracle@prod01]~% srvctl start listener -n prod01
PRCR-1013 : Failed to start resource ora.LISTENER.lsnr
PRCR-1064 : Failed to start resource ora.LISTENER.lsnr on node prod01
CRS-5016: Process "/oragrid/product/11.2.0.2/bin/lsnrctl" spawned by agent "/oragrid/product/11.2.0.2/bin/oraagent.bin" for action "start" failed: details at "(:CLSN00010:)" in "/home/oragrid/product/11.2.0.2/log/prod01/agent/crsd/oraagent_oracle/oraagent_oracle.log"
CRS-5016: Process "/oragrid/product/11.2.0.2/bin/lsnrctl" spawned by agent "/oragrid/product/11.2.0.2/bin/oraagent.bin" for action "start" failed: details at "(:CLSN00010:)" in "/home/oragrid/product/11.2.0.2/log/prod01/agent/crsd/oraagent_oracle/oraagent_oracle.log"
CRS-2674: Start of 'ora.LISTENER.lsnr' on 'prod01' failed

 

Issue 1: – Incorrect ORACLE_HOME entry in listener.ora

This issue can be verified by attempting to start listener by lsnrctl utility. Please note that you need to use $GRID_HOME/bin/lsnrctl utility to manage listener in 11gR2 RAC

[oracle@prod01]~% lsnrctl start

LSNRCTL for Linux: Version 11.2.0.2.0 - Production on 31-JAN-2012 08:09:52

Copyright (c) 1991, 2010, Oracle.  All rights reserved.

Starting /oragrid/product/11.2.0.2/bin/tnslsnr: please wait...

TNSLSNR for Linux: Version 11.2.0.2.0 - Production
System parameter file is /oragrid/product/11.2.0.2/network/admin/listener.ora
Log messages written to /oracle/diag/tnslsnr/prod01/listener/alert/log.xml
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.160)(PORT=1521)))
TNS-01201: Listener cannot find executable /oracle/product/11.2/bin/oracle for SID orcl01

Listener failed to start. See the error message(s) above..

Above error indicates that it is not able to find ‘oracle’ executable in specified path. But this is not our correct ORACLE_HOME. Checking listener.ora we found that this home is coming from ORACLE_HOME variable

SID_LIST_LISTENER =
  (SID_LIST =
    (SID_DESC =
      (SID_NAME =orcl01 )
      (ORACLE_HOME = /oracle/product/11.2)
    )
    (SID_DESC =
      (SID_NAME = PLSExtProc)
      (ORACLE_HOME = /oracle/product/11.2)
      (PROGRAM = extproc)
    )
  )

Correct ORACLE_HOME is oracle/product/11.2.0.2 which needs to be updated correctly in listener.ora. After adding we were able to start the listener.

Issue 2: CRS resource ora.[node_name].ons is down

While debugging listener startup issue, you found that resource ora.[node_name].ons is not starting. Listener (including SCAN_LISTENER)  is dependent on ora.ons resource
Checking $GRID_HOME/opmn/logs/ons.log[node_name], we see following messages

12/01/12 05:32:48 [ons-listener] Could not get address information for localhost 6100.
12/01/12 05:32:49 [internal] getaddrinfo(localhost, 6100, 1) failed (Name or service not known):
12/01/12 05:33:05 [internal] getaddrinfo(localhost, 6100, 1) failed (Name or service not known):
12/01/12 05:33:15 [internal] getaddrinfo(localhost, 6100, 1) failed (Name or service not known):
12/01/12 05:33:15 [internal] getaddrinfo(localhost, 6100, 1) failed (Name or service not known):
12/01/12 05:33:15 [internal] getaddrinfo(localhost, 6100, 1) failed (Name or service not known):

Issue is that your host is not able to resolve localhost setting.You can verify this by issuing ping localhost command.This issue can be resolved by adding following entry in /etc/hosts or to DNS

127.0.0.1 localhost

Issue 3: VIP and ora.[node_name].ons are not starting on one node

In this issue check that Bcast and Mask settings for Public interface are same on all nodes.

e.g ifconfig eth0
inet addr:192.168.1.4  Bcast:192.168.0.255  Mask:255.255.255.0

In our case we found that second node had Mask settings of 255.255.254.0. Correcting it and restarting interface resolved the issue.

11gR1 CRS start failing with ORA-29702

This post was actually a comment from Sasi which came on previous article 10.2 CRS startup issue . I am converting it to a post so that we can get some feedback on this issue from other users. I suspect this to be caused by RHEL5 bug (fixed in RHEL5u6) related to NIC going down when multiple interface cards are being used. 

We had a similar error but the problem was different and thought of sharing it here.

We recently installed 11gR1 two node RAC and all was fine till last week and suddenly we saw the same error “ORA-29702: error occurred in Cluster Group Service operation”. Crs was not starting . some of the crs process were running and it was refusing to stop.

root@node1> /u01/app/crs/bin/crsctl stop crs
Stopping resources.
This could take several minutes.
Error while stopping resources. Possible cause: CRSD is down.
Stopping Cluster Synchronization Services.
Unable to communicate with the Cluster Synchronization Services daemon.

ASM alert log and database alert log had the below to say

ASM Alert Log:

Errors in file /u02/app/asm/diag/asm/+asm/+ASM1/trace/+ASM1_lmon_2185.trc:
ORA-29702: error occurred in Cluster Group Service operation
LMON (ospid: 2185): terminating the instance due to error 29702
Mon Nov 22 20:02:16 2011
ORA-1092 : opitsk aborting process

Oracle database Alert Log:

ERROR: LMON (ospid: 3721) detects hung instances during IMR reconfiguration
Tue Nov 22 22:10:37 2011
Error: KGXGN polling error (16)
Errors in file /u03/app/oracle/diag/rdbms/ccbdrpd/ccbdrpd1/trace/ccbdrpd1_lmon_3721.trc:
ORA-29702: error occurred in Cluster Group Service operation
LMON (ospid: 3721): terminating the instance due to error 29702

Not much info in the trace files.

Looked at metalink note : Diagnosing ‘ORA-29702: error occurred in Cluster Group Service operation’ [ID 848622.1]
But the problems mentioned in it were not applicable to our site.

Looked at CRS alert log, CRSD logs and CSSD logs, there were heaps of information but not quite useful to nail down the issue. Could not see any error messages

Also, looked at

RAC ASM instances crash with ORA-29702 when multiple ASM instances start(ed) [ID 733262.1]

There it was mentioned, when using multiple NIC for cluster interconnect and if they are not bonded properly it could cause issues and that could be seen in the alert logs.

In our case NIC bonding was done properly. We have configured and bonded as below
• eth0 and eth1 bonded as bond0 – for public and
• eth2 and eth3 bonded as bond1 – for cluster interconnect

and alert log showed they were configured fine.

Interface type 1 bond1 192.xxx.x.x configured from OCR for use as a cluster interconnect
Interface type 1 bond0 xx.x.x.x configured from OCR for use as a public interface

If NIC bonding not done properly then you would see multiple entries for cluser interconnect in the alert log.

Well,though this was not the issue in our case but it gave me a lead to identify the root cause of the problem. As it was mentioned about bonding I wanted to check both channel bonding interface (ifcdfg-bond0 & ifcfg-bond1) and Ethernet interface configurations (ifcfg-eth0, ifcfg-eth1, ifcfg-eth2 & ifcfg-eth3)

Well, all configuration files were good except for ifcfg-bond1 file and the entries were as below,

root@node1>cat ifcfg-bond1

DEVICE=bond1
IPADDR=xxx.xxx.xx.x
NETMASK=255.xxx.x.x
USERCTL=no
BOOTPROTO=none
ONBOOT=yes
TYPE=ethernet

On the 1st look they seem to be fine but when compared to ifcfg-bond0 the problem was obvious. Ifcfg-bond0 entries were as below,

root@node1> cat ifcfg-bond0
DEVICE=bond0
BOOTPROTO=none
ONBOOT=yes
NETMASK=255.xxx.x.x
IPADDR=xx.x.x.x
GATEWAY=xx.x.x.x
USERCTL=no
TYPE=BOND

If you look at line entry TYPE it’s mentioned as “TYPE=ethernet” in Ifcfg-bond1 and “TYPE=BOND” In Ifcfg-bond0.

Bingo…changed the configuration file and rebooted the server and all components came up fine. CRS, ASM and DB started and working fine.

But trying to find out why it worked fine during the installation and then stopped working suddenly.

Changing CRS/Database timezone in 11.2.0.2 post install

I had installed a 11.2.0.2 RAC setup few days back with incorrect timezone. It had to be PDT but I installed with UTC.
Starting/stopping clusteware with correct timezone didn’t solve the issue.

In 11.2.0.2 Oracle stores timezone information in file $GRID_HOME/crs/install/s_config_(hostname).txt. In my case file looked like this

cd /oragrid/product/11.2/crs/install
cat s_crsconfig_prod1.txt
<strong>TZ=UTC</strong>
NLS_LANG=AMERICAN_AMERICA.AL32UTF8
TNS_ADMIN=
ORACLE_BASE=

To resolve the issue we need to change TZ to US/Pacific on all nodes and restart clusterware. So entry would be like

<strong>TZ=US/Pacific</strong>

On Restarting clusteware , database and clusteware starts with correct timezone.

In case you wish to have different database timezone only for Oracle database, then it is possible using srvctl command. E.g

srvctl setenv database -d orcl -t TZ=US/Pacific

You can confirm this with getenv command

[oracle@prod1]~% srvctl getenv database -d orcl
orcl:
TZ=US/Pacific

This would require database bounce. Also note that in case database is started manually it would not start with correct timezone. To unset the parameter use following command

[oracle@prod1]~% srvctl unsetenv database -d orcl -t TZ
[oracle@prod1]~% srvctl getenv database -d orcl
orcl:

Hope this helps