ASM

New ASM Background Processes in 11G

Few hours back I installed Oracle Database 11g(Though still DB is yet to be created), so I started playing with the ASM instance. First thing I did was to check ASM alert.log. I used ADRCI (11g) to see it

adrci> show incident

ADR Home = /u03/app/oracle/diag/asm/+asm/+ASM:
*********************************************************
0 rows fetched

adrci> show alert

ADR Home = /u03/app/oracle/diag/asm/+asm/+ASM:
**********************************************************
Output the results to file: /tmp/alert_9572_1_+ASM_1.ado
"/tmp/alert_9572_1_+ASM_1.ado" 48 lines, 1964 characters
PMON started with pid=2, OS id=3672
DIAG started with pid=4, OS id=3678
<strong>VKTM started with pid=3, OS id=3674
VKTM running at (100ms) precision</strong>
2008-06-24 15:24:12.425000 +05:30
PSP0 started with pid=5, OS id=3680
<strong>DSKM started with pid=6, OS id=3682</strong>
<strong>DIA0 started with pid=7, OS id=3684</strong>
MMAN started with pid=6, OS id=3686
DBW0 started with pid=8, OS id=3689
LGWR started with pid=9, OS id=3691
CKPT started with pid=10, OS id=3694
SMON started with pid=11, OS id=3700
RBAL started with pid=12, OS id=3702
GMON started with pid=13, OS id=3705
ORACLE_BASE from environment = /u03/app/oracle
<strong>Spfile /u03/app/oracle/product/11.1.0/db_1/dbs/spfile+ASM.ora is in old pre-11 format and compatible &gt;= 11.0.0; converting to  new H.A.R.D. compliant format.</strong>

I have highlighted the things which were not present in 10g. According to Docs

DIA0 (diagnosability process 0) (only 0 is currently being used) is responsible for hang detection and deadlock resolution.

VKTM (virtual keeper of time) is responsible for providing a wall-clock time (updated every second) and reference-time counter (updated every 20 ms and available only when running at elevated priority)

These were the definitions from docs. Oracle should have been more generous and also documented following

DIA0 – Does that mean we will have auto SystemState/Hanganalyze generated during hang? Will ORA-60 be handled by this process?

VKTM – What does this mean to us? Will this timer be used in 10046 timing information? Will it ensure Oracle Scheduler run jobs on time 🙂

DSKM – This is still not documented.

Last message kind of indicates that spfile is also made H.A.R.D complaint. So it should take care of corruptions.

I have created more confusion rather than explaining what these processes actually do 🙂 Anyways if you have any information, then it will be really nice if it can be shared…Thanks for Reading !!

11.1.0.6 ASM installation on Solaris fails -II

Some time back, I had written about CSS service not starting in my post 11.1.0.6 ASM installation on Solaris fails -I

After doing some research, I came across Metalink Document

Note:397238.1 – How to Convert init.cssd as a SMF service for Solaris 10

This document talks about using Service Management Facility (SMF) which was introduced in Solaris 10. To configure it we have to download a zip file from the note and copy two files called initcssd to /lib/svc/method/initcssd and copy the second file initcssd.xml to /var/svc/manifest/site and some other steps (as listed in doc)

1) Install Oracle Software on Solaris 10

2) Download files from Note: 397238.1. Once done, modify the files accordingly and copy to the required location.

3) Do the configuration and then enable the service.

<strong># svcadm -v enable initcssd</strong>

After doing this, still service does not start.

# ps -ef|grep css
    root 29137  3793   0 14:54:39 ?           0:00 /bin/sh /lib/svc/method/initcssd run
    root 29188 26874   0 14:54:50 pts/9       0:00 grep css

Note talks about checking the content of content of file /var/opt/oracle/scls_scr/<Your-hostname>/root/cssrun file.

But when I tried to check, I found that directory does not exists.

# cd /var/opt/oracle/scls_scr/
 cd: /var/opt/oracle/scls_scr/: No such file or directory
# cd /u03/app/oracle/product/11.1.0/db_1/bin/
<strong># ./crsctl start crs</strong>
Attempting to start Oracle Clusterware stack
Failure at scls_scr_create with code 1
Segmentation Fault (core dumped)

Actually this directory is created when we run ‘localconfig add’ which will configure the socket files and directories.

<strong># ps -ef|grep css</strong>
    root 29137  3793   0 14:54:39 ?           0:00 /bin/sh /lib/svc/method/initcssd run
# pwd
/u03/app/oracle/product/11.1.0/db_1/bin
<strong># ./localconfig add</strong>
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Configuration for local CSS has been initialized

Cleaning up Network socket directories
Setting up Network socket directories
Adding to inittab
Startup will be queued to init within 30 seconds.
Checking the status of new Oracle init process...
Expecting the CRS daemons to be up within 600 seconds.
Cluster Synchronization Services is active on these nodes.
        testzone2
Cluster Synchronization Services is active on all the nodes.
Oracle CSS service is installed and running under init(1M)
<strong># ps -ef|grep css</strong>
  oracle 29137  3793   0 14:54:39 ?           0:00 /u03/app/oracle/product/11.1.0/db_1/bin/ocssd.bin

Now just for fun, I thought of disabling the SMF and trying the configuration again.

# ./localconfig delete
Stopping Cluster Synchronization Services.
Shutting down the Cluster Synchronization Services daemon.
Shutdown request successfully issued.
Shutdown has begun. The daemons should exit soon.
Cleaning up Network socket directories

Disable the SMF
# svcadm -v disable initcssd
svc:/system/initcssd:default disabled.

Now again add CSS service

# ./localconfig add
Successfully accumulated necessary OCR keys.
Creating OCR keys for user \'root\', privgrp \'root\'..
Operation successful.
Configuration for local CSS has been initialized

Cleaning up Network socket directories
Setting up Network socket directories
Adding to inittab
Startup will be queued to init within 30 seconds.
Checking the status of new Oracle init process...
Expecting the CRS daemons to be up within 600 seconds.

Giving up: Oracle CSS stack appears NOT to be running.
Oracle CSS service would not start as installed
Automatic Storage Management(ASM) cannot be used until Oracle CSS service is started

Enable the initcssd SMF service

<strong># svcadm -v enable initcssd</strong>
svc:/system/initcssd:default enabled.
# ps -ef|grep css
  oracle  2589  3793   1 15:14:31 ?           0:00 /u03/app/oracle/product/11.1.0/db_1/bin/ocssd.bin

Cool!! CSS Service has started again 🙂

Note:397238.1 says that problem occurs only after rebooting the service and not during installation. May be document needs to be updated for Oracle 11g on Solaris 10. I would suggest anyone installing Oracle 11g on Solaris 10 should try using these steps. I will try to add a remark to the note. Apart from that Article is quite good.

Can ASM DiskGroup Be Renamed?

This was actually a question on Oracle forum which I had replied to. Basically the Poster, wanted to know if he could rename the ASM Diskgroup name by renaming/editing ASM Disk header. He had also mentioned that he had heard about this being done by Oracle for its few customers using kfed.

Answer is NO. It is not possible to rename the diskgroup by editing the ASM disk header. kfed is known to be used for patching ASM disk headers for corruption (only oracle support can do it) and for viewing ASM header contents. Only way to change this by dropping and recreating the diskgroup.

In case you wish to create a new diskgroup with a name say +DG1 which was being used by a Diskgroup which is not mounted (Have some ASM Disk members still in ASM_DISKSTRING path), then you would face following error

    <strong>ORA-15030</strong>: diskgroup name "string" is in use by another diskgroup
    <strong>Cause:</strong> A CREATE DISKGROUP command specified a Diskgroup name that was already assigned to another diskgroup.
    <strong>Action: </strong>Select a different name for the Diskgroup.

In case you wish to create the Diskgroup with same name +DG1 you will be required to clear the ASM disk header using

dd if=/dev/zero of=/dev/raw/raw11 bs=1024 count=100

After this you can recreate the Diskgroup with same name.

One Poster suggested renaming at LUN/Storage level. I believed this to be a destructive idea which could corrupt the Diskgroup. nvengurl replied to this and informed that we read ASM disk header to mount the Diskgroup and thus changing the LUN name/id/path will not solve the issue.

Update: This article is valid for 10g and 11gR1. Since 11gR2 oracle has introduced renamedg utility which can be used to rename diskgroup. I have documented the steps here

11.1.0.6 ASM installation on Solaris fails -I

Looks like that there is no easy way for me to install/configure Oracle components. And every install/configuration leads me to some or the other problems. Anyways I was trying to install Oracle Database 11g on a Solaris 10 but the CSS does not seems to come up. While issuing localconfig add as root user it gives following message and fails

-bash-3.00# ./localconfig add
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Configuration for local CSS has been initialized

Cleaning up Network socket directories
Setting up Network socket directories
Adding to inittab
Startup will be queued to init within 30 seconds.
Checking the status of new Oracle init process...
Expecting the CRS daemons to be up within 600 seconds.

Giving up: Oracle CSS stack appears NOT to be running.
Oracle CSS service would not start as installed
Automatic Storage Management(ASM) cannot be used until Oracle CSS service is started

On checking further for css log messages in $ORACLE_HOME/log/hostname/client , I found following messages

Oracle Database 11g CRS Release 11.1.0.6.0 - Production Copyright 1996, 2007 Oracle. All rights reserved.
2008-06-18 21:38:29.721: [ CSSCLNT][1]clsssInitNative: failed to connect to (ADDRESS=(PROTOCOL=ipc)(<strong>KEY=OCSSD_LL_test1zone2_</strong>))
, rc 9

On checking the /var/adm/messages file , found following errors

Jun 18 01:16:23 test1zone2 root: [ID 702911 user.error] Oracle Cluster Synchronization Service starting by user request.
Jun 18 01:16:25 test1zone2 root: [ID 702911 user.error] Cluster Ready Services completed waiting on dependencies.
Jun 18 01:16:35 test1zone2 last message repeated 9 times

/var/tmp/.oracle does not show any files created after running localconfig add. These are called Socket files and CSS uses these for communication.I suspect this to be a issue.
I checked our Linux machine (with RAC) and found that we have some files and one of them is named as

srwxrwxrwx  1 oracle oinstall 0 Jun 11 22:10 <strong>sOCSSD_LL_prod01_</strong>

This is quite similar to the error message in css.log files i.e KEY=OCSSD_LL_test1zone2_

As this is a test machine so I cant raise a ticket with Oracle. Currently posted a thread on Oracle Forum. Let’s see if someone is able to figure out something. I will keep you posted. If anyone is interested in reading how these sockets work then they can visit this article from Frits Hoogland.

Adding new ASM disk to RAC database fails

Many times i came across a common problem in RAC databases where trying to add an asm disk is not possible due to errors like

ORA-15075 “disk(s) are not visible cluster-wide”

ORA-15020 “discovered duplicate ASM disk “DISK1” and

ORA-15054 “disk “ORCL:DISK1” does not exist in diskgroup “DG1”.

Rebalancing the diskgroup and trying to add the disk with “FORCE” option also does not help in this case.

I will be discussing how to come out of a situation like this i.e When you are trying to add an asm disk in cluster environment and it says that disk is already added and when trying to drop the same disk it says that disk is not present in the diskgroup.

Lets start from the very begining:
I have decided to add an asm disk in RAC environment to an already existing diskgroup DATA1.

Login to asm instance “/ as sysdba”
SQL > ALTER DISKGROUP DATA1 ADD DISK ‘/dev/rdsk/c1t2d3s4’;

But it failed with following error:

ALTER DISKGROUP DATA1 ADD DISK '/dev/rdsk/c1t2d3s4';*
ERROR at line 1:
<strong>ORA-15032: not all alterations performed
ORA-15075: disk(s) are not visible cluster-wide</strong>

This is due to the fact that the physical disk partition is not visible from all RAC nodes. Then i contacted the sysadmins to make sure that the disk is visible from all RAC nodes and accessible by ORACLE. They have fixed the problem and now the disk /dev/rdsk/c1t2d3s4 can be seen from all RAC nodes. Then i tried to add the disk again using force option as:

SQL > ALTER DISKGROUP DATA1 ADD DISK ‘/dev/rdsk/c1t2d3s4’ force;
But it failed with following error:

ORA-15020: discovered duplicate ASM disk “/dev/rdsk/c1t2d3s4”

It shows that disk with same name is already present in the diskgroup.

As it shows that the disk is already present in the diskgroup, while trying to drop the disk i got following error:

SQL&gt; alter diskgroup DATA1 drop disk '/dev/rdsk/c1t2d3s4';
alter diskgroup DATA1 drop disk '/dev/rdsk/c1t2d3s4'
*
ERROR at line 1:
<strong>ORA-15032 : not all alterations performed
ORA-15054 : disk "/dev/rdsk/c1t2d3s4" does not exist in diskgroup "DATA1"</strong>

Now I cannot move further as adding and dropping the disk is not possible here. Then I decided to check the status of the disk from v$asm_disk from all RAC nodes, to do this issue following query:

SQL > col name format a15
SQL > col path format a20
SQL > select GROUP_NUMBER,DISK_NUMBER,MOUNT_STATUS,HEADER_STATUS,NAME,PATH from v$asm_disk;

We Obain following results from all the nodes :

G# D# HEADER_STATU MOUNT_S STATE NAME PATH
—- —- ———— ——- ——– ———— ————————-
0 0 MEMBER IGNORED NORMAL /dev/rdsk/c1t2d3s4

Header_status=MEMBER means that the disk is a valid asm disk on all RAC nodes.
Mount_status=IGNORED means that Disk is present in the system, but is ignored by ASM.

Group_number=0 This is the number used when a disk is not mounted by a diskgroup.

Now by checking the dd output of the disk as :

$dd if=/dev/rdsk/c1t2d3s4 of=/tmp/disk.out bs=4096 count=1096

$ vi /tmp/disk.out

I found that the diskgroup name and disk number allocated to this disk, which confirms that the disk is now a part of diskgroup DATA1.

But from the results of the header_status,mount_status and group_number it is clear that the disk is partially added to RAC asm instances. To correct this we will have to clear the disk header to add it again:

# dd if=/dev/null of=/dev/rdsk/c1t2d3s4 bs=4096 count=5000

This command cleared the disk header and after that disk was added successfully.

Note: – Please note that using dd will clear the ASM header and should be used only after confirming the disk. Using it on a wrong disk can cause Diskgroup to dismount and lead to Data Loss.