Grid Control Fails to Start

The link for Grid Control was not Working and it failed to show the login page. In this situation  checked the following things:

1. The repository database was up and running
2. The listener was up.
3. OMS server was down.

Tried to stratup the OMS server using opmnctl startall command, but it showed following error in ons.log file:

09/04/08 06:06:18 [4] ONS server initiated
09/04/08 06:06:18 [2] BIND (Address already in use)
09/04/08 06:06:18 [2] 127.0.0.1:6104 - listener BIND failed
09/04/08 06:06:18 [4] Listener thread 98311: 127.0.0.1:6104 (0x442)  terminating
09/04/08 06:06:18 [1] Local listener terminated

As it showed that the port is busy, now the obvious choice is to check which application is still holding the port. It can be done by using command:

netstat -a|grep <port_number>

i.e netstat -a |grep 6104

and the result was, no application was using this port.

But still this port was shown busy to OMS server, hence failed startup of OMS.

Then I changed the port to a different unused port in opmn.xml file and started the OMS server, it got started with new port number.

After that I stopped the OMS server and again changed the port number back to the original port number which was giving error earlier.

This time OMS started with the  old port number as well.

Cheers!!!!!

Saurabh Sood

Connections to DataBase Hang Including “/ as sysdba”

Recently I faced one issue where all the connection to database hung and it was also not possible to login to database using “/ as sysdba”.
To get access of sqlplus I used the following syntax:

$ sqlplus -prelim / as sysdba

With “prelim” option we can run some commands which will help in collection useful information about the problem.

This will work only in Oracle 10g and higher version.

After successfully getting connected run the following commands to generate Hanganalyze and systemstate traces:

SQL> oradebug setmypid

SQL> oradebug unlimit

SQL> oradebug dump systemstate 266

SQL> oradebug tracefile_name

— This will give you the name of the tracefile generated.

SQL > oradebug dump hanganalyze 2

SQL > oradebug tracefile_name

To analyze these trace files one should be aware of Metalink Note: 215858.1.

After analyzing these files I found that following event was active and causing the hang:

<span style="font-family: arial,helvetica,sans-serif;"><span style="font-size: small;">"resmgr:cpu quantum"
Cmd: PL/SQL Execute

It means that the sessions are waiting for their turn on CPU.

This event occurs when resource manage is active and controls the allocation of CPU to processes.

We can also see the command which is causing all this: i.e some PL/SQL code was executing and spnning on for CPU.

After finding out this, checked with “TOP” command, got the PID of the process consuming all the cpu and killed that process with “kill -9”

After killing that process the users were able to connect.

So the cause of the Hang was found i.e PL/SQL, but it is still unknown why  PL/SQL caused problems. 🙂

Cheers!!!

Saurabh Sood

OUI-67124 – Copy failed from ‘location 1’ to ‘location 2’

Just a short note to discuss a problem faced by me while applying CPU Jan patch to clusterware on AIX 5L. I was getting following errors

UtilSession failed: ApplySession failed in system modification phase... 'ApplySession::apply failed: Copy failed from '/archive/oracle/soft/Patch/6980307/6756433/files/lib/libhasgen10.so' to '/oracle/crs_base/app/product/crs10gR2/lib/libhasgen10.so'...
Copy failed from '/archive/oracle/soft/Patch/6980307/6756433/files/lib/libocr10.so' to '/oracle/crs_base/app/product/crs10gR2/lib/libocr10.so'...
Copy failed from '/archive/oracle/soft/Patch/6980307/6756433/files/lib/libocrb10.so' to '/oracle/crs_base/app/product/crs10gR2/lib/libocrb10.so'...
Copy failed from '/archive/oracle/soft/Patch/6980307/6756433/files/lib/libocrutl10.so' to '/oracle/crs_base/app/product/crs10gR2/lib/libocrutl10.so'..

I had followed all the  pre-requsites for this patch installation i.e

1)Stopped the database instance and ASM instance on the node

2)Stopped the nodeapps services

3)Stopped the clusterware

4) Executed /usr/sbin/slibclean as root

I searched over metalink and found a note recommending renaming the files and  retrying the patching process. One more suggestion was to copy the files manually. I thought of debugging this issue (also wanted to have clean installation), so I checked for processes being run by ‘oracle’ user. I found that listener was running

oracle 1982506       1   0 00:30:13      -  0:00 /oracle/ora_base/app/product/db10gR2/bin/tnslsnr LISTENER_TAF_PRODDB1 -inherit

This was a listener which was created manually (not using netca) and not registered in the OCR. As a result, it did not stop when we stopped the nodeapps services. I then stopped the listener and executed /usr/sbin/slibclean (as root) and re-initiated the patching process. This time it went fine.

One more easier way would have been to use ‘fuser’ command to identify the pid’s for the processes accessing the file.

In the end I realized that before proceeding to apply patch ,it is better to check for if any Instance or listener or any other process (RMAN, sqlplus or sqlloader utilities too) is running from Oracle Home being patched even though you have followed all the steps mentioned in Patch readme.

Cheers

Amit

Upgrade to WordPress 2.7.1

WordPress 2.7.1 is out..While using upgrade button, it will be stuck at downloading files and will not proceed further. This issue will be faced by people using “Automatic Upgrade” plugin available in prior release. You need to disable the plugin and then re-try the operation. Also note that you have to take a “Backup ” of database as the plugin does not take automatic backup.

WordPress Automatic Upgrade plugin did not work for me , as this gave message for my 2.7 installation

“Congratulations!!! Your WordPress version is already up to date”

So you can easily go ahead and disable this plugin and use the Upgrade option available with WordPress 2.7

Cheers
Amit

Scheduling Job through Cron??

In case you came across this link searching for Cron syntax, then you can refer to following link

http://www.adminschoice.com/docs/crontab.htm

Other’s can continue reading the article …. 😉

This was a problem which I faced yesterday. We had a mview refresh which was lagging on one of the server. On checking master site, I found that there were many pending requests coming from the second materialized view site . Checking second site revealed that there were multiple sessions running in database which were waiting on ‘enq :JI Contention’ wait event.

These mviews were being refreshed by job scheduled through a cron. Doing a grep for the refresh script gave following result

 $ps -ef|grep scott|grep -v grep
    oracle 11725 11714  0 13:00:00 ?         0:00 /usr/bin/sh /home/oracle/cron/Refresh_scott.sh
  oracle 19981 19970  0  Feb  7  ?         0:00 /usr/bin/sh /home/oracle/cron/Refresh_scott.sh
  oracle 24794 24781  0 06:30:00 ?         0:00 /usr/bin/sh /home/oracle/cron/Refresh_scott.sh
    oracle 10538 10527  0  Feb  6  ?         0:00 /usr/bin/sh /home/oracle/cron/Refresh_scott.sh
    oracle 13972 13935  0  Feb  6  ?         0:00 /usr/bin/sh /home/oracle/cron/Refresh_scott.sh
  oracle  2601  2592  0 19:00:00 ?         0:00 /usr/bin/sh /home/oracle/cron/Refresh_scott.sh
  oracle 17274 17246  0  Feb  6  ?         0:00 /usr/bin/sh /home/oracle/cron/Refresh_scott.sh
  oracle  8308  8294  0 13:30:00 ?         0:00 /usr/bin/sh /home/oracle/cron/Refresh_scott.sh

This was definitely wrong. I killed the database sessions and also killed the OS processes

$ ps -ef|grep Refresh_scott.sh|grep -v grep|awk ‘{print $2}’|xargs kill -9

This issue had occurred as one more site was added for mview refresh and due to locking issues (while deleting records from Mlog$), job could not complete before next refresh schedule. While scheduling scripts through cron, care should be taken to put a check if script is already running or not. This is required as Cron will spawn the new job (as per job schedule) irrespective of fact that whether earlier execution has completed or not. I used following logic to implement the same check

cnt=`ps -ef|grep Refresh_scott.sh|grep -v grep |wc -l`
if [ $cnt -eq 1 ]
then
echo "Running Fast Refresh now " > $DIR/Check_refresh_scott.log
sqlplus /nolog <<EOF > $DIR/Refresh_scott.log
conn scott/tiger
exec dbms_mview.refresh ...
...
exit
EOF
else
echo "Fast refresh already running " > $DIR/Check_refresh_scott.log
date >> $DIR/Check_refresh_scott.log
exit
fi

There could be other logics too which can be used to implement the same. e.g Other could be

filename=$DIR/Check_refresh_scott.log
if [ -e $filename  ]
then
echo "Fast Refresh already running " > refresh_runnning.log
exit
else
sqlplus /nolog <<EOF > $DIR/Refresh_scott.log
conn scott/tiger
exec dbms_mview.refresh ...
...
exit
EOF
rm $filename
fi

Other way could be to use DBMS_JOB and DBMS_SCHEDULER to schedule the database job.

Creating ASM devices on AIX

I thought of sharing few tips on creating ASM devices on AIX which I will be helpful to Oracle DBA’s. Suppose SysAdmin gives you list of Serial numbers for LUN instead of device Name

pcmpath query device
DEV#:  33  DEVICE NAME: hdisk33  TYPE: 2107900  ALGORITHM:  Load Balance
SERIAL: 75DM011<span style="color: #ff0000;"><strong>1101</strong></span>
===========================================================================
Path#      Adapter/Path Name          State     Mode     Select     Errors
    0           fscsi0/path0          CLOSE   NORMAL          9          0
    1           fscsi1/path1          CLOSE   NORMAL          8          0

In case there are lot many disks, then it could be a tiring task of running above command and finding each device. You can use below code which will list name of  devices and size (In MB) of disk.

for i in 1000 1100    1018    1118    1030    1130    104C    114C    1068    1168    1080    1180
do
j=`pcmpath query device|grep -p $i"$"|grep DEVICE|awk -F ":" '{print }'|awk '{print }`
k=`bootinfo -s $j`
echo $i $j $k
done

This would return following output

1000 hdisk4 65536
1100 hdisk10 65536
1018 hdisk5 65536
1118 hdisk11 65536
1030 hdisk6 65536
1130 hdisk12 65536
104C hdisk7 65536
114C hdisk13 65536
1068 hdisk8 65536
1168 hdisk14 65536
1080 hdisk9 65536
1180 hdisk15 65536

Now if you need to create new device name, you need to use mknod command and pass on major and minor numbers. Following code can be used to perform same

#export m=0
 # for i in hdisk4  hdisk10 hdisk5  hdisk11 hdisk6  hdisk12 hdisk7  hdisk13 hdisk8  hdisk14 hdisk9  hdisk15
 do
 j=`ls -la /dev/$i |awk '{print }'|awk -F "," '{print }'`
  k=`ls -la /dev/$i |awk '{print }'`
 m=`expr $m + 1` ;echo "mknod /dev/asm_disk"$m "c "$j $k
 done

 mknod /dev/asm_disk1 c 21 4
 mknod /dev/asm_disk2 c 21 12
 mknod /dev/asm_disk3 c 21 13
 mknod /dev/asm_disk4 c 21 15
 mknod /dev/asm_disk5 c 21 5
 mknod /dev/asm_disk6 c 21 6
 mknod /dev/asm_disk7 c 21 8
 mknod /dev/asm_disk8 c 21 7
 mknod /dev/asm_disk9 c 21 14
 mknod /dev/asm_disk10 c 21 10
 mknod /dev/asm_disk11 c 21 9
 mknod /dev/asm_disk12 c 21 11

Now you can change the ownership to oracle:dba and permission to 660. I have 12 disks , so using list of 12 variables. In case you have more disks , then you can add more variables

# for i in 1 2 3 4 5 6 7 8 9 10 11 12
do
chown oracle:dba /dev/asm_disk$i
chmod 660 /dev/asm_disk$i
done

crw-rw----    1 oracle   dba          21, 11 Jan 28 17:10 /dev/asm_disk12
crw-rw----    1 oracle   dba          21,  9 Jan 28 17:10 /dev/asm_disk11
crw-rw----    1 oracle   dba          21, 10 Jan 28 17:10 /dev/asm_disk10
crw-rw----    1 oracle   dba          21, 14 Jan 28 17:04 /dev/asm_disk9
crw-rw----    1 oracle   dba          21,  7 Jan 28 17:04 /dev/asm_disk8
crw-rw----    1 oracle   dba          21,  8 Jan 28 17:04 /dev/asm_disk7
crw-rw----    1 oracle   dba          21,  6 Jan 28 17:04 /dev/asm_disk6
crw-rw----    1 oracle   dba          21,  5 Jan 28 17:04 /dev/asm_disk5
crw-rw----    1 oracle   dba          21, 15 Jan 28 17:04 /dev/asm_disk4
crw-rw----    1 oracle   dba          21, 13 Jan 28 17:04 /dev/asm_disk3
crw-rw----    1 oracle   dba          21, 12 Jan 28 17:04 /dev/asm_disk2
crw-rw----    1 oracle   dba          21,  4 Jan 28 17:04 /dev/asm_disk1

In case you need to use same logic for creating OCR and Voting disks on RAC system, replace /dev/asm with /dev/ocr or /dev/voting . I hope this would save some time and also prevent errors 🙂

In case you have disks in ordered number,say 53 to 62 then you can also use for loop as below.

#bash
bash-3.00#
#export m=0
#for ((i=53;i<=62;i++))
do
 j=`ls -la /dev/hdisk$i |awk '{print }'|awk -F "," '{print }'`
  k=`ls -la /dev/hdisk$i |awk '{print }'`
   m=`expr $m + 1` ;echo "mknod /dev/asm_disk"$m "c "$j $k
 done
#for ((i=1;i<=10;i++))
do
chown oracle:dba /dev/asm_disk$i
chmod 660 /dev/asm_disk$i
done

I would suggest anyone using the scripts to first check in a test environment.