RAC

Enabling Flashback On RAC Database

Enabling Flashback /Archive Log mode on a Single Instance Database is quite straight forward. In case of RAC, you need to follow additional steps.

The requirements for enabling Flashback Database are:

  • Your database must be running in ARCHIVELOG mode, because archived logs are used in the Flashback Database operation.
  • You must have a flash recovery area enabled, because flashback logs can only be stored in the flash recovery area.
  • For Real Application Clusters databases, the flash recovery area must be stored in a clustered file system or in ASM.

First of all configure flash recovery area by setting db_recovery_file_dest_size and db_recovery_file_dest

<span style="font-family: arial,helvetica,sans-serif; font-size: small;">ALTER SYSTEM SET DB_RECOVERY_FILE_DEST_SIZE = 20G SCOPE=BOTH;
ALTER SYSTEM SET DB_RECOVERY_FILE_DEST = '+DG1' SCOPE=BOTH;</span>

We are using ASM Diskgroup here which is sharable and available to both the nodes.Next step is to enable archivelog mode and then to turn on flashback. To perform this, database needs to be in mount mode.

We can use srvctl to disable any associated Database service and then stop the Database

<span style="font-family: arial,helvetica,sans-serif; font-size: small;">[/home/oracle&gt;srvctl stop service -d </span><span><span style="font-family: arial,helvetica,sans-serif; font-size: small;">TESTDB</span></span><span style="font-family: arial,helvetica,sans-serif; font-size: small;">
[/home/oracle&gt;srvctl stop database -d TESTDB</span>

Now set Cluster_database=false for enabling the Archivelog mode. This is a additional step which is required in RAC Database. For Single Instance, we do not require it.

<span style="font-family: arial,helvetica,sans-serif; font-size: small;">/home/oracle&gt;sqlplus "/ as sysdba"

Connected to an idle instance.

SQL&gt; startup nomount
ORACLE instance started.

Total System Global Area 1073741824 bytes
Fixed Size                  1271564 bytes
Variable Size             314575092 bytes
Database Buffers          750780416 bytes
Redo Buffers                7114752 bytes
SQL&gt; alter system set cluster_database=false scope=spfile;

System altered.

SQL&gt; shutdown immediate
ORA-01507: database not mounted

ORACLE instance shut down.
SQL&gt; exit
SQL&gt; startup mount
ORACLE instance started.

Total System Global Area 1073741824 bytes
Fixed Size                  1271564 bytes
Variable Size             314575092 bytes
Database Buffers          750780416 bytes
Redo Buffers                7114752 bytes
Database mounted.
SQL&gt;alter database archivelog;
SQL&gt; alter database flashback on;

Database altered.
</span>

Set the Cluster_database parameter again to true.

<span style="font-family: arial,helvetica,sans-serif; font-size: small;">SQL&gt;  alter system set cluster_database=true scope=spfile;
System altered.
SQL&gt;shutdown immediate</span>

We will again use srvctl to start the database and associated service

<span style="font-family: arial,helvetica,sans-serif; font-size: small;">[/home/oracle&gt;srvctl start database -d TESTDB
[/home/oracle&gt;srvctl start service -d </span><span><span style="font-family: arial,helvetica,sans-serif; font-size: small;">TESTDB</span></span><span style="font-family: arial,helvetica,sans-serif; font-size: small;">
</span>

We can confirm whether Archivelog mode and Flashback is enabled by querying V$DATABASE

<span style="font-family: arial,helvetica,sans-serif; font-size: small;">SQL&gt; SELECT LOG_MODE,FLASHBACK_ON FROM V$DATABASE;
LOG_MODE     FLASHBACK_ON
------------ ------------------
ARCHIVELOG   YES
</span>

DBConsole Issue on RAC -Part II

Continuing with DBConsole issue, we were able to get the cluvfy return success for the nodes.

Basically Oracle suggested us that when we use ssh we should not get any banner. E.g

[oracle@PROD01 ~]$ ssh PROD02 date
Fri Jun 13 02:00:41 IST 2008

But in our case it was displaying a banner which displayed a warning message when someone logged in Server.

As we are using Linux, we renamed file /etc/issue.net to something else and tried running cluvfy again. It was successful this time

[oracle@PROD01 ~]$ cluvfy comp nodecon -n all

Verifying node connectivity

Checking node connectivity...

Node connectivity check passed for subnet "10.X.X.X" with node(s) PROD02,PROD01.
Node connectivity check passed for subnet "192.X.X.X" with node(s) PROD02,PROD01

Interfaces found on subnet "192.X.X.X" that are likely candidates for VIP:
PROD02 eth3:192.X.X.X
PROD01 eth3:192.X.X.X

Interfaces found on subnet "10.X.X.X" that are likely candidates for a private interconnect:
PROD02 eth2:10.X.X.X eth2:10.X.X.X
PROD01 eth2:10.X.X.X eth2:10.X.X.X

Node connectivity check passed.


Verification of node connectivity was successful.

But the DBConsole issue still remains. It is still unable to find the hostname. Now waiting for Oracle 🙁

DBConsole Issue on RAC -Part I

Currently I am working on issue where DBConsole is not starting on our 2 Node RAC system. When I try to start, I get following errors

[oracle@PROD01 ~]$ emctl status dbconsole
TZ set to US/Pacific
Exception in getting local host
java.net.UnknownHostException: PROD01: PROD01
        at java.net.InetAddress.getLocalHost(InetAddress.java:1191)
        at oracle.sysman.emSDK.conf.TargetInstaller.getLocalHost
(TargetInstaller.java:5561)
        at oracle.sysman.emSDK.conf.TargetInstaller.main
(TargetInstaller.java:4126)
Exception in getting local host

I tried recreating the DBConsole but that also failed and gave following error

[oracle@PROD01 ~]$ emca -config dbcontrol db  -cluster

STARTED EMCA at Jun 12, 2008 3:29:40 AM
EM Configuration Assistant, Version 10.2.0.1.0 Production
Copyright (c) 2003, 2005, Oracle.  All rights reserved.

Jun 12, 2008 3:29:40 AM oracle.sysman.emcp.util.ClusterUtil getHostName
SEVERE: Error getting hostname for the cluster node PROD01. This node may not be configured correctly
Enter the following information:
Database unique name: testdb1
Jun 12, 2008 3:29:42 AM oracle.sysman.emcp.ParamsManager getInaccessibleNodeList
WARNING: The following cluster nodes are unavailable: [PROD01, PROD02].
Jun 12, 2008 3:29:42 AM oracle.sysman.emcp.ParamsManager getInaccessibleSidList
WARNING: The requested operation will not be performed for the following instances: [testdb11, testdb12].
No cluster nodes found when configuring the RAC database for EM

Above error informs that the nodes are not available, but if we check the status, they are indeed running.

[oracle@PROD01 ~]$ crs_stat -t
Name           Type           Target    State     Host
------------------------------------------------------------
ora.testdb1.db    application    ONLINE    ONLINE    PROD01
ora....omp1.cs application    ONLINE    ONLINE    PROD01
ora....11.inst application    ONLINE    ONLINE    PROD01
ora....12.inst application    ONLINE    ONLINE    PROD02
ora....SM1.asm application    ONLINE    ONLINE    PROD01
ora....01.lsnr application    ONLINE    ONLINE    PROD01
ora....d01.gsd application    ONLINE    ONLINE    PROD01
ora....d01.ons application    ONLINE    ONLINE    PROD01
ora....d01.vip application    ONLINE    ONLINE    PROD01
ora....dM2.asm application    ONLINE    ONLINE    PROD02
ora....02.lsnr application    ONLINE    ONLINE    PROD02
ora....d02.gsd application    ONLINE    ONLINE    PROD02
ora....d02.ons application    ONLINE    ONLINE    PROD02
ora....d02.vip application    ONLINE    ONLINE    PROD02

At this moment I searched Metalink for any known issues. I came across

Note.388440.1 – Problem Emca Fails To Configure DB Control For RAC Database Error Getting Hostname For The Cluster Node

According to this we need to confirm that SSH is set and output of “cluvfy comp nodecon -n all” command should return Sucess. In our case SSH was already set. So I tried using the command but it was Unsucessful

[oracle@PROD01 ~]$ cluvfy comp nodecon -n all

Verifying node connectivity

Verification of node connectivity was unsuccessful on all the nodes.

At this moment we created a SR with Oracle Support. We were asked to then check

Note 549667.1 – Cluvfy returns “Unsuccessful” for most commands, with no other details

We verified that this note was not applicable to us as file permissions for files (Discussed in Note 549667.1) were correctly set. Now we have one more SR which has been created with RAC team to resolve the “Cluvfy” issue.

It’s been long wait and despite of SR being Escalated, still haven’t got a response from Analyst. Will keep you all posted about the issue and will share the solution. Meanwhile if someone else has also faced this situation and resolved it, then do let me know.

Adding new ASM disk to RAC database fails

Many times i came across a common problem in RAC databases where trying to add an asm disk is not possible due to errors like

ORA-15075 “disk(s) are not visible cluster-wide”

ORA-15020 “discovered duplicate ASM disk “DISK1” and

ORA-15054 “disk “ORCL:DISK1” does not exist in diskgroup “DG1”.

Rebalancing the diskgroup and trying to add the disk with “FORCE” option also does not help in this case.

I will be discussing how to come out of a situation like this i.e When you are trying to add an asm disk in cluster environment and it says that disk is already added and when trying to drop the same disk it says that disk is not present in the diskgroup.

Lets start from the very begining:
I have decided to add an asm disk in RAC environment to an already existing diskgroup DATA1.

Login to asm instance “/ as sysdba”
SQL > ALTER DISKGROUP DATA1 ADD DISK ‘/dev/rdsk/c1t2d3s4’;

But it failed with following error:

ALTER DISKGROUP DATA1 ADD DISK '/dev/rdsk/c1t2d3s4';*
ERROR at line 1:
<strong>ORA-15032: not all alterations performed
ORA-15075: disk(s) are not visible cluster-wide</strong>

This is due to the fact that the physical disk partition is not visible from all RAC nodes. Then i contacted the sysadmins to make sure that the disk is visible from all RAC nodes and accessible by ORACLE. They have fixed the problem and now the disk /dev/rdsk/c1t2d3s4 can be seen from all RAC nodes. Then i tried to add the disk again using force option as:

SQL > ALTER DISKGROUP DATA1 ADD DISK ‘/dev/rdsk/c1t2d3s4’ force;
But it failed with following error:

ORA-15020: discovered duplicate ASM disk “/dev/rdsk/c1t2d3s4”

It shows that disk with same name is already present in the diskgroup.

As it shows that the disk is already present in the diskgroup, while trying to drop the disk i got following error:

SQL&gt; alter diskgroup DATA1 drop disk '/dev/rdsk/c1t2d3s4';
alter diskgroup DATA1 drop disk '/dev/rdsk/c1t2d3s4'
*
ERROR at line 1:
<strong>ORA-15032 : not all alterations performed
ORA-15054 : disk "/dev/rdsk/c1t2d3s4" does not exist in diskgroup "DATA1"</strong>

Now I cannot move further as adding and dropping the disk is not possible here. Then I decided to check the status of the disk from v$asm_disk from all RAC nodes, to do this issue following query:

SQL > col name format a15
SQL > col path format a20
SQL > select GROUP_NUMBER,DISK_NUMBER,MOUNT_STATUS,HEADER_STATUS,NAME,PATH from v$asm_disk;

We Obain following results from all the nodes :

G# D# HEADER_STATU MOUNT_S STATE NAME PATH
—- —- ———— ——- ——– ———— ————————-
0 0 MEMBER IGNORED NORMAL /dev/rdsk/c1t2d3s4

Header_status=MEMBER means that the disk is a valid asm disk on all RAC nodes.
Mount_status=IGNORED means that Disk is present in the system, but is ignored by ASM.

Group_number=0 This is the number used when a disk is not mounted by a diskgroup.

Now by checking the dd output of the disk as :

$dd if=/dev/rdsk/c1t2d3s4 of=/tmp/disk.out bs=4096 count=1096

$ vi /tmp/disk.out

I found that the diskgroup name and disk number allocated to this disk, which confirms that the disk is now a part of diskgroup DATA1.

But from the results of the header_status,mount_status and group_number it is clear that the disk is partially added to RAC asm instances. To correct this we will have to clear the disk header to add it again:

# dd if=/dev/null of=/dev/rdsk/c1t2d3s4 bs=4096 count=5000

This command cleared the disk header and after that disk was added successfully.

Note: – Please note that using dd will clear the ASM header and should be used only after confirming the disk. Using it on a wrong disk can cause Diskgroup to dismount and lead to Data Loss.

Transparent Application Failover – TAF

Transparent Application Failover (TAF) is a client-side feature that allows for clients to reconnect to surviving databases in the event of a failure of a database instance.

Note that this is not used for purpose of load balancing or for Connect Time Failover.

 

TAF operates in two modes

 – Session Failover which will recreate lost connections and sessions

– Select Failover which will replay queries that were in progress. It will discard the old rows which have been fetched earlier and will fetch the rest of rows.

 TAF can be implemented at

 – Client Side

Server Side

 Client Side

 This can be done by creating a entry in tnsnames.ora as follows

 

TESTDB10_basic=
(DESCRIPTION=
<strong>(LOAD_BALANCE=on)
 (FAILOVER=on) </strong>
(ADDRESS= (PROTOCOL=tcp) HOST=prod01-vip)(PORT=1521))
(ADDRESS= (PROTOCOL=tcp)(HOST= prod02-vip)PORT=1521))
(CONNECT_DATA=
 (SERVICE_NAME=TESTDB10)
<strong>(FAILOVER_MODE= (TYPE=select)
(METHOD=basic</strong>))))

Here FAILOVER_MODE parameter is used to implement TAF.

Above configuration is for connections which will allow new connections to be created at failover time. This is determined by parameter METHOD which is set to BASIC here.

You can also have a session established before. This helps as new connection can take time and will thus result in faster failover. To do this use

TESTDB_prod01=
(DESCRIPTION=
(LOAD_BALANCE=on)
(FAILOVER=on)
(ADDRESS= (PROTOCOL=tcp) HOST=prod01-vip)(PORT=1521))
(CONNECT_DATA=
(SERVICE_NAME=TESTDB)
(INSTANCE_NAME=TESTDB1)
(FAILOVER_MODE= <strong>(BACKUP= TESTDB_prod02)
(TYPE=select) (METHOD=PRECONNECT)</strong>)))

 TESTDB_prod02=
(DESCRIPTION=
(LOAD_BALANCE=on)
 (FAILOVER=on)
 (ADDRESS= (PROTOCOL=tcp) HOST=prod01-vip)(PORT=1521))
 (CONNECT_DATA=
 (SERVICE_NAME=TESTDB10)
(INSTANCE_NAME=TESTDB2)
 (FAILOVER_MODE=
 <strong>(BACKUP= TESTDB_prod01)
(TYPE=select)
(METHOD=PRECONNECT)</strong>)))

Following query can be used to monitor information about failed over sessions

SELECT MACHINE, FAILOVER_TYPE, FAILOVER_METHOD, FAILED_OVER, COUNT(*)
FROM V$SESSION GROUP BY MACHINE, FAILOVER_TYPE, FAILOVER_METHOD, FAILED_OVER;

Server Side Configuration

 

This can be done using server-side service attributes. Please note that in case both (Client side and server side) configuration are used then service-side settings will be used.

 

You can refer to following note for details

 

Note 404644.1 – Configuration of Transparent Application Failover(TAF) works with server side service

Features of TAF

 – Sessions executing any insert/update/delete statement fail then statement will be rollback.

A command which has been successfully completed upon failure and has changed the database state, then TAF does not resend the command.

When using select failover, we will retrieve only the rows which have not been fetched by earlier sessions.

CRSCTL CheatSheet

You can find below various commands which can be used to administer Oracle Clusterware using crsctl. This is for purpose of easy reference.

Start Oracle Clusterware

#crsctl start crs

Stop Oracle Clusterware

#crsctl stop crs

Enable Oracle Clusterware

#crsctl enable crs

It enables automatic startup of Clusterware daemons

Disable Oracle Clusterware

#crsctl disable crs

It disables automatic startup of Clusterware daemons. This is useful when you are performing some
operations like OS patching and does not want clusterware to start the daemons automatically.

Checking Voting disk Location

$crsctl query css votedisk

0. 0 /dev/sda3
1. 0 /dev/sda5
2. 0 /dev/sda6
Located 3 voting disk(s).

Note: -Any command which just needs to query information can be run using oracle user. But anything which alters Oracle Clusterware requires root privileges.

Add Voting disk

#crsctl add css votedisk path

Remove Voting disk

#crsctl delete css votedisk path

Check CRS Status

$crsctl check crs

Cluster Synchronization Services appears healthy

Cluster Ready Services appears healthy

Event Manager appears healthy

You can also see particular daemon status

$crsctl check cssd

Cluster Synchronization Services appears healthy

$crsctl check crsd

Cluster Ready Services appears healthy

$crsctl check evmd

Event Manager appears healthy

You can also check Clusterware status on both the nodes using

$crsctl check cluster

prod01 ONLINE

prod02 ONLINE

Checking Oracle Clusterware Version

To determine software version (binary version of the software on a particular cluster node) use

$crsctl query crs softwareversion

Oracle Clusterware version on node [prod01] is [11.1.0.6.0]

For checking active version on cluster, use

$ crsctl query crs activeversion

Oracle Clusterware active version on the cluster is [11.1.0.6.0]

As per documentation, multiple versions are used while upgrading.

There are other options for CRSCTL too which can be seen using

$crsctl

Or

$crsctl help

11.2 Reference

11.2 introduced few changes to crsctl usage. Most important is clusterized commands which allows you to perform remote operations. They are

  • crsctl check cluster
  • crsctl start cluster
  • crsctl stop cluster

All these commands allow following usage

Default Stop local server
-all Stop all servers
-n Stop named servers
server […] One or more blank-separated server names
-f Force option

Let’s see usage

% crsctl check cluster -all
**************************************************************
prod01:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
prod02:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************

crsctl pin css is used to associate node name with node number. i.e if olsnodes shows prod01 as 1, then it should persist. This is helpful if you intend to run pre 11.2 database

#crsctl pin css -n prod01
#crsctl pin css -n prod02

To check daemon status, following commands need to be used

Check crsd – crsctl check crs

Check cssd – crsctl check crs

check evmd – crsctl check evm

crs_unregister is replaced by crsctl delete resource <resource_name>

crs_stat has been deprecated (though still works) and you need to use

$crsctl stat res -t
e.g

--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
               ONLINE  ONLINE       prod01
               ONLINE  ONLINE       prod02
ora.FLASH.dg
               ONLINE  ONLINE       prod01
               ONLINE  ONLINE       prod02
ora.LISTENER.lsnr
               ONLINE  ONLINE       prod01
               ONLINE  ONLINE       prod02
ora.asm
               ONLINE  ONLINE       prod01               Started
               ONLINE  ONLINE       prod02               Started
ora.gsd
               OFFLINE OFFLINE      prod01
               OFFLINE OFFLINE      prod02
ora.net1.network
               ONLINE  ONLINE       prod01
               ONLINE  ONLINE       prod02
ora.ons
               ONLINE  ONLINE       prod01
               ONLINE  ONLINE       prod02
ora.registry.acfs
               ONLINE  ONLINE       prod01
               ONLINE  ONLINE       prod02
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       prod01
ora.cvu
      1        ONLINE  ONLINE       prod01
ora.oc4j
      1        ONLINE  ONLINE       prod01
ora.prod01.vip
      1        ONLINE  ONLINE       prod01
ora.prod02.vip
      1        ONLINE  ONLINE       prod02
ora.scan1.vip
      1        ONLINE  ONLINE       prod01
ora.tintin.db
      1        ONLINE  ONLINE       prod01               Open
      2        ONLINE  ONLINE       prod02               Open

Wrote following awk command to give output on one line

crsctl status res |grep -v "^$"|awk -F "=" 'BEGIN {print " "} {printf("%s",NR%4 ? $2"|" : $2"\n")}'|sed -e 's/  *, /,/g' -e 's/, /,/g'|\
awk -F "|" 'BEGIN { printf "%-40s%-35s%-20s%-50s\n","Resource Name","Resource Type","Target ","State" }{ split ($3,trg,",") split ($4,st,",")}{for (i in trg) {printf "%-40s%-35s%-20s%-50s\n",$1,$2,trg[i],st[i]}}'

output

Resource Name                           Resource Type                      Target              State                                             

ora.DATA.dg                             ora.diskgroup.type                 ONLINE              ONLINE on prod01                              
ora.DATA.dg                             ora.diskgroup.type                 ONLINE              ONLINE on prod02                              
ora.FLASH.dg                            ora.diskgroup.type                 ONLINE              ONLINE on prod01                              
ora.FLASH.dg                            ora.diskgroup.type                 ONLINE              ONLINE on prod02                              
ora.LISTENER.lsnr                       ora.listener.type                  ONLINE              ONLINE on prod01                              
ora.LISTENER.lsnr                       ora.listener.type                  ONLINE              ONLINE on prod02                              
ora.LISTENER_SCAN1.lsnr                 ora.scan_listener.type             ONLINE              ONLINE on prod01                              
ora.asm                                 ora.asm.type                       ONLINE              ONLINE on prod01                              
ora.asm                                 ora.asm.type                       ONLINE              ONLINE on prod02                              
ora.cvu                                 ora.cvu.type                       ONLINE              ONLINE on prod01                              
ora.gsd                                 ora.gsd.type                       OFFLINE             OFFLINE                                           
ora.gsd                                 ora.gsd.type                       OFFLINE             OFFLINE                                           
ora.net1.network                        ora.network.type                   ONLINE              ONLINE on prod01                              
ora.net1.network                        ora.network.type                   ONLINE              ONLINE on prod02                              
ora.oc4j                                ora.oc4j.type                      ONLINE              ONLINE on prod01                              
ora.ons                                 ora.ons.type                       ONLINE              ONLINE on prod01                              
ora.ons                                 ora.ons.type                       ONLINE              ONLINE on prod02                              
ora.prod01.vip                      ora.cluster_vip_net1.type          ONLINE              ONLINE on prod01                              
ora.prod02.vip                      ora.cluster_vip_net1.type          ONLINE              ONLINE on prod02                              
ora.registry.acfs                       ora.registry.acfs.type             ONLINE              ONLINE on prod01                              
ora.registry.acfs                       ora.registry.acfs.type             ONLINE              ONLINE on prod02                              
ora.scan1.vip                           ora.scan_vip.type                  ONLINE              ONLINE on prod01                              
ora.snowy.db                            ora.database.type                  OFFLINE             OFFLINE                                           
ora.snowy.db                            ora.database.type                  ONLINE              OFFLINE                                           
ora.tintin.db                           ora.database.type                  ONLINE              ONLINE on prod01                              
ora.tintin.db                           ora.database.type                  ONLINE              ONLINE on prod02                              
ora.tintin.tintin_db_svc.svc            ora.service.type                   ONLINE              ONLINE on prod02                              
ora.tintin.tintin_ggate_svc.svc         ora.service.type                   ONLINE              ONLINE on prod01