June 2008

We are here to Stay !!

I had actually planned to write this post last week. It’s our Third web address till date with first being http://askoracledba.blogspot.com and second one being http://askoracledba.wordpress.com . Now we are on http://askdba.org/weblog/ and hope to continue this.

We decided to buy our own domain so that we are not dependent on Free service providers like Blogger, WordPress.com. Both of them have their advantages/disadvantages though WordPress Blogs score more over blogspot. This can also be seen by the number of users moving towards WordPress. We chose for independent domain as it gives us all advantages of WordPress with additional functionalities like option to choose Design templates/CSS editing and also display Ads. Anyways, from now on we will be posting on this site. Hope we can dish out interesting and helpful articles!! You can also visit our Homepage (currently it is not 100% complete). You will soon see articles/presentations related to Oracle and SQL Server there. You can also ask any questions or clarify any doubts on our Forum

[poll id=”2″]

DBConsole Issue on RAC -Part II

Continuing with DBConsole issue, we were able to get the cluvfy return success for the nodes.

Basically Oracle suggested us that when we use ssh we should not get any banner. E.g

[oracle@PROD01 ~]$ ssh PROD02 date
Fri Jun 13 02:00:41 IST 2008

But in our case it was displaying a banner which displayed a warning message when someone logged in Server.

As we are using Linux, we renamed file /etc/issue.net to something else and tried running cluvfy again. It was successful this time

[oracle@PROD01 ~]$ cluvfy comp nodecon -n all

Verifying node connectivity

Checking node connectivity...

Node connectivity check passed for subnet "10.X.X.X" with node(s) PROD02,PROD01.
Node connectivity check passed for subnet "192.X.X.X" with node(s) PROD02,PROD01

Interfaces found on subnet "192.X.X.X" that are likely candidates for VIP:
PROD02 eth3:192.X.X.X
PROD01 eth3:192.X.X.X

Interfaces found on subnet "10.X.X.X" that are likely candidates for a private interconnect:
PROD02 eth2:10.X.X.X eth2:10.X.X.X
PROD01 eth2:10.X.X.X eth2:10.X.X.X

Node connectivity check passed.


Verification of node connectivity was successful.

But the DBConsole issue still remains. It is still unable to find the hostname. Now waiting for Oracle 🙁

DBConsole Issue on RAC -Part I

Currently I am working on issue where DBConsole is not starting on our 2 Node RAC system. When I try to start, I get following errors

[oracle@PROD01 ~]$ emctl status dbconsole
TZ set to US/Pacific
Exception in getting local host
java.net.UnknownHostException: PROD01: PROD01
        at java.net.InetAddress.getLocalHost(InetAddress.java:1191)
        at oracle.sysman.emSDK.conf.TargetInstaller.getLocalHost
(TargetInstaller.java:5561)
        at oracle.sysman.emSDK.conf.TargetInstaller.main
(TargetInstaller.java:4126)
Exception in getting local host

I tried recreating the DBConsole but that also failed and gave following error

[oracle@PROD01 ~]$ emca -config dbcontrol db  -cluster

STARTED EMCA at Jun 12, 2008 3:29:40 AM
EM Configuration Assistant, Version 10.2.0.1.0 Production
Copyright (c) 2003, 2005, Oracle.  All rights reserved.

Jun 12, 2008 3:29:40 AM oracle.sysman.emcp.util.ClusterUtil getHostName
SEVERE: Error getting hostname for the cluster node PROD01. This node may not be configured correctly
Enter the following information:
Database unique name: testdb1
Jun 12, 2008 3:29:42 AM oracle.sysman.emcp.ParamsManager getInaccessibleNodeList
WARNING: The following cluster nodes are unavailable: [PROD01, PROD02].
Jun 12, 2008 3:29:42 AM oracle.sysman.emcp.ParamsManager getInaccessibleSidList
WARNING: The requested operation will not be performed for the following instances: [testdb11, testdb12].
No cluster nodes found when configuring the RAC database for EM

Above error informs that the nodes are not available, but if we check the status, they are indeed running.

[oracle@PROD01 ~]$ crs_stat -t
Name           Type           Target    State     Host
------------------------------------------------------------
ora.testdb1.db    application    ONLINE    ONLINE    PROD01
ora....omp1.cs application    ONLINE    ONLINE    PROD01
ora....11.inst application    ONLINE    ONLINE    PROD01
ora....12.inst application    ONLINE    ONLINE    PROD02
ora....SM1.asm application    ONLINE    ONLINE    PROD01
ora....01.lsnr application    ONLINE    ONLINE    PROD01
ora....d01.gsd application    ONLINE    ONLINE    PROD01
ora....d01.ons application    ONLINE    ONLINE    PROD01
ora....d01.vip application    ONLINE    ONLINE    PROD01
ora....dM2.asm application    ONLINE    ONLINE    PROD02
ora....02.lsnr application    ONLINE    ONLINE    PROD02
ora....d02.gsd application    ONLINE    ONLINE    PROD02
ora....d02.ons application    ONLINE    ONLINE    PROD02
ora....d02.vip application    ONLINE    ONLINE    PROD02

At this moment I searched Metalink for any known issues. I came across

Note.388440.1 – Problem Emca Fails To Configure DB Control For RAC Database Error Getting Hostname For The Cluster Node

According to this we need to confirm that SSH is set and output of “cluvfy comp nodecon -n all” command should return Sucess. In our case SSH was already set. So I tried using the command but it was Unsucessful

[oracle@PROD01 ~]$ cluvfy comp nodecon -n all

Verifying node connectivity

Verification of node connectivity was unsuccessful on all the nodes.

At this moment we created a SR with Oracle Support. We were asked to then check

Note 549667.1 – Cluvfy returns “Unsuccessful” for most commands, with no other details

We verified that this note was not applicable to us as file permissions for files (Discussed in Note 549667.1) were correctly set. Now we have one more SR which has been created with RAC team to resolve the “Cluvfy” issue.

It’s been long wait and despite of SR being Escalated, still haven’t got a response from Analyst. Will keep you all posted about the issue and will share the solution. Meanwhile if someone else has also faced this situation and resolved it, then do let me know.

Easy and Free X-Windowing Setup with Cygwin

Today I had to carry out Oracle client Installation on one of the server. Normally I use vncviewer to connect to one of my Linux machine (having vncserver setup) and then get the display on to that machine.

But my colleague suggested me about one more approach, which I thought of sharing with all of you. This is using a X –Windowing software which is Free 🙂 Surprised!!

This can be done using Cygwin which is a freeware so you can trash your Reflection X or Hummingbird Installations. You can check various X-Windows software available here

Lets come back to Cygwin. You can download the software from http://x.cygwin.com/

To perform the Cygwin setup, you need to follow below steps

1) Start the XWindows server on your PC. This could be done by running startxwin-multiwindow file present on Desktop (Created during installation) or running /usr/X11R6/bin/startxwin.sh from cygwin

$ sh /usr/X11R6/bin/startxwin.sh

You can also use startx. For configuration check this link

http://x.cygwin.com/docs/ug/using.html

3) Connect to the Unix/Linux machine and set the display to your machine. Use ipconfig command to get the IP address for your machine. E.g

C:\Documents and Settings\amit >ipconfig
Windows IP Configuration
Ethernet adapter Local Area Connection:
        Connection-specific DNS Suffix  . : xyz.com
        IP Address. . . . . . . . . . . . : <strong>192.168.4.47</strong>
        Subnet Mask . . . . . . . . . . . : X.X.X.X
        Default Gateway . . . . . . . . . : X.X.X.X

Now set the DISPLAY variable as

$export DISPLAY=192.168.4.47:0.0

4) Test it by typing any application like xcalc,xclock,etc.

If X11 Forwarding is enabled on the remote machine, you can also directly login to server using ssh

$ssh oracle@ip –X

Now you can carry out oracle Installation using runInstaller 🙂

Adding new ASM disk to RAC database fails

Many times i came across a common problem in RAC databases where trying to add an asm disk is not possible due to errors like

ORA-15075 “disk(s) are not visible cluster-wide”

ORA-15020 “discovered duplicate ASM disk “DISK1” and

ORA-15054 “disk “ORCL:DISK1” does not exist in diskgroup “DG1”.

Rebalancing the diskgroup and trying to add the disk with “FORCE” option also does not help in this case.

I will be discussing how to come out of a situation like this i.e When you are trying to add an asm disk in cluster environment and it says that disk is already added and when trying to drop the same disk it says that disk is not present in the diskgroup.

Lets start from the very begining:
I have decided to add an asm disk in RAC environment to an already existing diskgroup DATA1.

Login to asm instance “/ as sysdba”
SQL > ALTER DISKGROUP DATA1 ADD DISK ‘/dev/rdsk/c1t2d3s4’;

But it failed with following error:

ALTER DISKGROUP DATA1 ADD DISK '/dev/rdsk/c1t2d3s4';*
ERROR at line 1:
<strong>ORA-15032: not all alterations performed
ORA-15075: disk(s) are not visible cluster-wide</strong>

This is due to the fact that the physical disk partition is not visible from all RAC nodes. Then i contacted the sysadmins to make sure that the disk is visible from all RAC nodes and accessible by ORACLE. They have fixed the problem and now the disk /dev/rdsk/c1t2d3s4 can be seen from all RAC nodes. Then i tried to add the disk again using force option as:

SQL > ALTER DISKGROUP DATA1 ADD DISK ‘/dev/rdsk/c1t2d3s4’ force;
But it failed with following error:

ORA-15020: discovered duplicate ASM disk “/dev/rdsk/c1t2d3s4”

It shows that disk with same name is already present in the diskgroup.

As it shows that the disk is already present in the diskgroup, while trying to drop the disk i got following error:

SQL&gt; alter diskgroup DATA1 drop disk '/dev/rdsk/c1t2d3s4';
alter diskgroup DATA1 drop disk '/dev/rdsk/c1t2d3s4'
*
ERROR at line 1:
<strong>ORA-15032 : not all alterations performed
ORA-15054 : disk "/dev/rdsk/c1t2d3s4" does not exist in diskgroup "DATA1"</strong>

Now I cannot move further as adding and dropping the disk is not possible here. Then I decided to check the status of the disk from v$asm_disk from all RAC nodes, to do this issue following query:

SQL > col name format a15
SQL > col path format a20
SQL > select GROUP_NUMBER,DISK_NUMBER,MOUNT_STATUS,HEADER_STATUS,NAME,PATH from v$asm_disk;

We Obain following results from all the nodes :

G# D# HEADER_STATU MOUNT_S STATE NAME PATH
—- —- ———— ——- ——– ———— ————————-
0 0 MEMBER IGNORED NORMAL /dev/rdsk/c1t2d3s4

Header_status=MEMBER means that the disk is a valid asm disk on all RAC nodes.
Mount_status=IGNORED means that Disk is present in the system, but is ignored by ASM.

Group_number=0 This is the number used when a disk is not mounted by a diskgroup.

Now by checking the dd output of the disk as :

$dd if=/dev/rdsk/c1t2d3s4 of=/tmp/disk.out bs=4096 count=1096

$ vi /tmp/disk.out

I found that the diskgroup name and disk number allocated to this disk, which confirms that the disk is now a part of diskgroup DATA1.

But from the results of the header_status,mount_status and group_number it is clear that the disk is partially added to RAC asm instances. To correct this we will have to clear the disk header to add it again:

# dd if=/dev/null of=/dev/rdsk/c1t2d3s4 bs=4096 count=5000

This command cleared the disk header and after that disk was added successfully.

Note: – Please note that using dd will clear the ASM header and should be used only after confirming the disk. Using it on a wrong disk can cause Diskgroup to dismount and lead to Data Loss.

Transparent Application Failover – TAF

Transparent Application Failover (TAF) is a client-side feature that allows for clients to reconnect to surviving databases in the event of a failure of a database instance.

Note that this is not used for purpose of load balancing or for Connect Time Failover.

 

TAF operates in two modes

 – Session Failover which will recreate lost connections and sessions

– Select Failover which will replay queries that were in progress. It will discard the old rows which have been fetched earlier and will fetch the rest of rows.

 TAF can be implemented at

 – Client Side

Server Side

 Client Side

 This can be done by creating a entry in tnsnames.ora as follows

 

TESTDB10_basic=
(DESCRIPTION=
<strong>(LOAD_BALANCE=on)
 (FAILOVER=on) </strong>
(ADDRESS= (PROTOCOL=tcp) HOST=prod01-vip)(PORT=1521))
(ADDRESS= (PROTOCOL=tcp)(HOST= prod02-vip)PORT=1521))
(CONNECT_DATA=
 (SERVICE_NAME=TESTDB10)
<strong>(FAILOVER_MODE= (TYPE=select)
(METHOD=basic</strong>))))

Here FAILOVER_MODE parameter is used to implement TAF.

Above configuration is for connections which will allow new connections to be created at failover time. This is determined by parameter METHOD which is set to BASIC here.

You can also have a session established before. This helps as new connection can take time and will thus result in faster failover. To do this use

TESTDB_prod01=
(DESCRIPTION=
(LOAD_BALANCE=on)
(FAILOVER=on)
(ADDRESS= (PROTOCOL=tcp) HOST=prod01-vip)(PORT=1521))
(CONNECT_DATA=
(SERVICE_NAME=TESTDB)
(INSTANCE_NAME=TESTDB1)
(FAILOVER_MODE= <strong>(BACKUP= TESTDB_prod02)
(TYPE=select) (METHOD=PRECONNECT)</strong>)))

 TESTDB_prod02=
(DESCRIPTION=
(LOAD_BALANCE=on)
 (FAILOVER=on)
 (ADDRESS= (PROTOCOL=tcp) HOST=prod01-vip)(PORT=1521))
 (CONNECT_DATA=
 (SERVICE_NAME=TESTDB10)
(INSTANCE_NAME=TESTDB2)
 (FAILOVER_MODE=
 <strong>(BACKUP= TESTDB_prod01)
(TYPE=select)
(METHOD=PRECONNECT)</strong>)))

Following query can be used to monitor information about failed over sessions

SELECT MACHINE, FAILOVER_TYPE, FAILOVER_METHOD, FAILED_OVER, COUNT(*)
FROM V$SESSION GROUP BY MACHINE, FAILOVER_TYPE, FAILOVER_METHOD, FAILED_OVER;

Server Side Configuration

 

This can be done using server-side service attributes. Please note that in case both (Client side and server side) configuration are used then service-side settings will be used.

 

You can refer to following note for details

 

Note 404644.1 – Configuration of Transparent Application Failover(TAF) works with server side service

Features of TAF

 – Sessions executing any insert/update/delete statement fail then statement will be rollback.

A command which has been successfully completed upon failure and has changed the database state, then TAF does not resend the command.

When using select failover, we will retrieve only the rows which have not been fetched by earlier sessions.