Following error was coming while trying to start the +ASM2 instance with SRVCTL:
$srvctl start asm -n rac2
PRKS-1009 : Failed to start ASM instance "+ASM2" on node "rac2",
[CRS-0223: Resource 'ora.rac2.ASM2.asm' has placement error.]
While trying to start the same with crs_start :
$ crs_start -f ora.rac2.ASM2.asm
CRS-1028: Dependency analysis failed because of:
'Resource in UNKNOWN state: ora.rac2.ASM2.asm'
CRS-0223: Resource 'ora.rac2.ASM2.asm' has placement error
There are two ways to come out of this UNKNOWN state of resources: 1. Start the resource from sqlplus
2. Use crs_stop -f to clear the state of database resources.
Exception in thread "main" java.lang.UnsatisfiedLinkError: /home/oracle/product/10.2/jdk/jre/lib/i386/libawt.so: libXp.so.6: cannot open shared object file: No such file or directory
at java.lang.ClassLoader$NativeLibrary.load(Native Method)
at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1586)
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1503)
at java.lang.Runtime.loadLibrary0(Runtime.java:788)
at java.lang.System.loadLibrary(System.java:834)
at sun.security.action.LoadLibraryAction.run(LoadLibraryAction.java:50)
at java.security.AccessController.doPrivileged(Native Method)
at sun.awt.NativeLibLoader.loadLibraries(NativeLibLoader.java:38)
at sun.awt.DebugHelper.(DebugHelper.java:29)
at java.awt.Component.(Component.java:506)
This can be resolved by installing xorg-x11-deprecated-libs rpm. (yum install xorg-x11-deprecated-libs)
Many of us would have come across RDA (Remote Diagnostic Agent) while working on a ticket with Oracle support. In case you have not heard about it, I would recommend to go through Metalink Note:314422.1 – Remote Diagnostic Agent (RDA) 4 – Getting Started
RDA captures System Information such as OS,Hardware Details (like number of CPU and amount of RAM),OS error log,OS Monitoring tool output (like vmstat,TOP,etc). This can be handy in case you do not know the command or the location of the OS logs.Similary you can find Database version,Database Patch inventory,Database Alert log and trace files.
This can help save lot of time as you need not remember all OS commands to capture the information.
Similarly RDA also collects Database Performance Statistics like OS Statistics (CPU,Memory and Disk I/O Stats) along with TOP SQL, Locking and Latch statistics. In case of 10g, it generates AWR Report (60 mins) and ADDM report based on captured Snapshots. All this information can be helpful for diagnosing a Performance Problem.
There is one more use of RDA which not many people are aware of. i.e RDA Health Check / Validation Engine (HCVE). HCVE Engine can be used to perform Pre-Install checks for Oracle Database and Oracle Application server on Unix system (At time of writing this article, this functionality is not available on windows)
To run this , you need to execute rda.sh -T hcve e.g I need to validate if I can install Oracle 10gR2 on my OEL4 (Linux x86).
$ ./rda.sh -T hcve
Processing HCVE tests ...
Available Pre-Installation Rule Sets:
1. Oracle Database 10g R1 (10.1.0) PreInstall (Linux-x86)
2. Oracle Database 10g R1 (10.1.0) PreInstall (Linux AMD64)
3. Oracle Database 10g R1 (10.1.0) PreInstall (IA-64 Linux)
4. Oracle Database 10g R2 (10.2.0) PreInstall (Linux AMD64)
5. Oracle Database 10g R2 (10.2.0) PreInstall (IA-64 Linux)
6. Oracle Database 10g R2 (10.2.0) PreInstall (Linux-x86)
7. Oracle Database 11g R1 (11.1.0) PreInstall (Linux AMD64)
8. Oracle Database 11g R1 (11.1.0) PreInstall (Linux-x86)
9. Oracle Application Server 10g (9.0.4) PreInstall (Linux)
10. Oracle Application Server 10g R2 (10.1.2) PreInstall (Linux)
11. Oracle Application Server 10g R3 (10.1.3) PreInstall (Linux AMD64)
12. Oracle Application Server 10g R3 (10.1.3) PreInstall (IA-64 Linux)
13. Oracle Application Server 10g R3 (10.1.3) PreInstall (Linux-x86)
14. Oracle Portal PreInstall (Generic)
Available Post-Installation Rule Sets:
15. Oracle Portal PostInstall (generic)
16. RAC 10G DB and OS Best Practices (Linux)
17. Data Guard PostInstall (Generic)
Enter the HCVE rule set number
Hit 'Return' to accept the default (1)
<strong>> 6</strong>
Enter value for < Planned ORACLE_HOME location or if set >
Hit 'Return' to accept the default ($ORACLE_HOME)
<strong>> /u01/app/oracle</strong>
Test "Oracle Database 10g R2 (10.2.0) PreInstall (Linux-x86)" executed at Wed Aug 27 15:12:18 2008
Test Results
~~~~~~~~~~~~
ID NAME RESULT VALUE
===== ==================== ====== ========================================
10 OS Certified? PASSED Adequate
20 User in /etc/passwd? PASSED userOK
30 Group in /etc/group? PASSED GroupOK
40 Input ORACLE_HOME RECORD /u01/app/oracle
50 ORACLE_HOME Valid? PASSED OHexists
60 O_H Permissions OK? PASSED CorrectPerms
70 Umask Set to 022? PASSED UmaskOK
80 LDLIBRARYPATH Unset? FAILED IsSet
100 Other O_Hs in PATH? FAILED OratabEntryInPath
110 oraInventory Permiss PASSED oraInventoryOK
120 /tmp Adequate? PASSED TempSpaceOK
130 Swap (in MB) RECORD 1051
140 RAM (in MB) FAILED 1001
150 Swap OK? FAILED InsufficientSwap
160 Disk Space OK? PASSED DiskSpaceOK
170 Kernel Parameters OK PASSED KernelOK
180 Got ld,nm,ar,make? PASSED ld_nm_ar_make_found
190 ulimits OK? FAILED StackTooSmall MaxLockMemTooSmall
200 EL4 RPMs OK? PASSED EL4rpmsOK
204 RHEL3 RPMs OK? PASSED NotRedHat
205 RHEL4 RPMs OK? PASSED NotRedHat
209 SUSE SLES9 RPMs OK? PASSED NotSuSE
212 Patch 3006854 Instal PASSED NotRHEL3
214 ip_local_port_range PASSED ip_local_port_rangeOK
220 Tainted Kernel? PASSED NotVerifiable
230 Other OUI Up? PASSED NoOtherOUI
Result file: /home/oracle/rda/output/RDA_HCVE_A201DB10R2_lnx_res.htm
I also tried out option “RAC 10G DB and OS Best Practices (Linux)” which is part of Post Install but for some reason some of the components failed.
Enter the HCVE rule set number
Hit 'Return' to accept the default (1)
> 16
Enter the password for 'SYSTEM':
Please re-enter it to confirm:
Test "RAC 10G DB and OS Best Practices (Linux)" executed at Wed Aug 27 17:26:33 2008
Test Results
~~~~~~~~~~~~
ID NAME RESULT VALUE
===== ==================== ====== ========================================
10 ORA_CRS_HOME RECORD /u01/app/crs
100 Database Name RECORD orcl
102 Database Version RECORD 10.2.0.4.0
104 Interconnect Network RECORD
106 DB Block Size RECORD 8192
108 DB File Multiblock R RECORD 16
120 Max Commit Propagati PASSED 0
130 SYS.AUDSES$ Cache Si PASSED 10000
132 SYS.IDGEN1$ Cache Si FAILED 20
<strong> 140 Parallel Execution M FAILED 2148</strong>
150 Min Parallel Servers RECORD 1
152 Min Parallel Servers FAILED 0
200 $ORA_CRS_HOME Define PASSED Found
210 Remote Access PASSED All loaded
<strong> 220 _USR_ORA_DEBUG / CRS FAILED blrraclnx1:? blrraclnx2:?
230 _USR_ORA_DEBUG / ORA FAILED blrraclnx1:? blrraclnx2:?</strong>
240 rmem_max PASSED OK
250 UDP Buffer Size PASSED OK
260 wmem_max PASSED OK
270 rmem_default PASSED OK
280 wmem_default PASSED OK
290 Sysrq Magic Keys PASSED OK
300 Oracle Executable Li PASSED linked
<strong> 310 hangcheck-timer FAILED blrraclnx1:Unknown blrraclnx2:Unknown
320 aio-max-size Setting FAILED blrraclnx1:Unknown blrraclnx2:Unknown</strong>
330 Memory (32-bit) PASSED OK
<strong> 340 Swap (32-bit) FAILED [blrraclnx1:]Swap<2RAM [blrraclnx2:]S..></strong>
350 Swap (64-bit) PASSED OK
360 Patch List PASSED Complete
Result file: /home/oracle/rda/output/RDA_HCVE_P400RAC_lnx_res.htm
Continuing my experiments with our 2 Node 10g RAC Test system, I carried out upgrade of Oracle Clusterware and Oracle RAC Database from 10.2.0.1 to 10.2.0.4. I have tried to document the steps for upgrading Oracle Clusterware(Rolling Upgrade) and RAC Database upgrade in this post. In case you observe any mistakes, please let me know
First step is to download the 10.2.0.4 Patchset from metalink. In our case ,we downloaded Patch 6810189 (10g Release 2 (10.2.0.4) Patch Set 3 for Linux x86). You can follow Patch Readme for detailed steps.
We will be doing Rolling upgrade for Oracle Clusterware i.e we will only bring one node down for patching while other node will be available and accepting database connections. Before you start the process, take backup of following so as to restore it in case of failed upgrade
a) Full OS backup (as some binaries are present in /etc ,etc)
This will open OUI screen. Select Database Home for Patching.
6) On the Summary screen, click Install.When prompted, run the $ORACLE_HOME/root.sh script as the root
user on both the nodes. On completion of this , we need to perform post installation steps.
7)Start listener and ASM Instance on both the nodes
8)For Oracle RAC Installation, we need to set CLUSTER_DATABASE=FALSE before upgrading
<span style="font-size: small; font-family: arial,helvetica,sans-serif;">[oracle@blrraclnx1 ~]sqlplus "/ as sysdba"
SQL>startup nomount
SQL> alter system set cluster_database=false scope=spfile;
System altered.
SQL>shutdown immediate;
SQL>startup upgrade
SQL>spool 10204patch.log
SQL>@?/rdbms/admin/catupgrd.sql
SQL>spool off</span>
Log file needs to be reviewed for any errors. catupgrd.sql took 42 minutes on my system. In case CLUSTER_DATABASE parameter is not set to False, you will get following error while starting database in upgrade mode
ORA-39701: database must be mounted EXCLUSIVE for UPGRADE or DOWNGRADE
We need to Restart the database now and run utlrp.sql.
While going through the routine checks from Grid Control, I found a critical alert stating “clusterware integrity check failed” and by clicking on this message it says that there is problem with some metric collections on RAC environment.
To check the node reachability status following query was run:
$ $CRS_HOME/bin/cluvfy comp nodecon -n all
This will check the internode connectivity for all nodes in the cluster. It came out with following message:
$ $CRS_HOME/bin/cluvfy comp nodecon -n all
Verifying node connectivity
Verification of node connectivity was unsuccessful on all the nodes.
Even the CRS component check was unsuccessful:
$ $CRS_HOME/bin/cluvfy comp crs -n all
It came out with the following message:
$ $CRS_HOME/bin/cluvfy comp crs -n all
Verifying CRS integrity
Verification of CRS integrity was unsuccessful on all the nodes.
After this it was quite obvious to check the CRS status:
This confirmed that the CRS install is valid, but the question now is why the cluster verification utility (CVU) was failing?
To find the reason I enabled the tracing of CVU as:
$export SRVM_TRACE=true
It will set the environment variable SRVM_TRACE to true and tracing of CVU will generate a trace file under $CRS_HOME/cv/log with name like “cvutrace.log.X”
After setting this and again running $CRS_HOME/bin/cluvfy comp crs -n all trace file with name cvutrace.log.0 was generated.
Last week we were trying to setup a 2 Node 10g RAC System on Linux with openfiler used for shared storage. We were using the articlewritten by Jeffery Hunter. This was not the first time I was doing it, but by mistake we chose the machine with 500Mb memory to be used for one of the RAC system and used a 1Gb memory machine for Openfiler. We carried on with the installation though cluvfy and runInstaller gave us warnings regarding the same.
But once the installation completed, I found the database was shutting down frequently with “PMON Failed to acquire Latch”. I tried to debug it, but was not able to figure out anything from Systemstate dump which was generated.
Anyways we decided to rebuild the system.So we decided to cleanup the Machine 2 and meanwhile re-installed the openfiler. I was not having the OEL Cd’s on that day,so couldn’t build the Machine 1. So I went ahead with cleaning Machine 2 for re-installing software. I saw an opportunity of setting up a single node RAC and then adding another node. I followed below steps for cleaning the RAC installation
Please note that you can directly stop the clusterware (while cleaning up) as this will automatically stop the dependent resources.
2) Remove the installation files and other related files
I had installed CRS in /u01/app/crs and Database home was located in /u01/app/oracle. So I removed both the directories.
Note that if you are having multiple oracle database installation, then ensure that you do not remove orainventory directoy or any other ORACLE_HOME. In my case this was the only installation. Remove following files related to clusterware
Also remove the OCR and Voting disk files. In my case it was stored in OCFS2 filesystem /u02/oradata/orcl. In case it is on raw devices , you can remove it using dd command. Remove ocr.loc file present in /etc/oracle
You can also refer to Note:239998.1 – 10g RAC: How to Clean Up After a Failed CRS Install
In our case as we were re-installing after successful installation. So we even had to clean the ASM disks. They can be again cleaned up by formatting the header with dd command.
As we were removing the other node and had to reconfigure SSH, I removed /home/oracle/.ssh directory. I didn’t reconfigure SSH again thinking that as it will not be required for single node install. I restarted the Clusterware install and encountered following error
“The Specified nodes are not clusterable”
In another window, one more error was reported, which actually made it clear where the problem was
“Failed to check remote command execution setup for node <nodename> shells /us/bin/ssh and /usr/bin/rsh”
Screenshot for the error can be seen below
Above error clearly states that error was due to unavailability of ssh or rsh. After this I did setup for ssh for single node and tested this too to avoid any further errors.
<span style="font-size: small; font-family: arial,helvetica,sans-serif;">$ ssh blrraclnx2 date
Sun Aug 10 14:32:29 EDT 2008</span>
Anyways all these errors could have been avoided, had I used cluvfy utility as below
I was browsing through the OraNA.info posts and found an interesting post from Steve Karam which referred to a metalink Note:578455.1 – Announcement of De-Support of RAW devices in Release 12G
If you go through the note, it mentions desupport of Raw devices from Oracle Database 12G. Article also lists out possibility of using ASM,OCFS as few of the alternative storage mediums for keeping OCR and Voting Disks (Used in Oracle Real Application Clusters(RAC)) .
OCFS2 already supports the storage of OCR and Voting Disk. But note also talks about ASM will be supporting the files. Hmmm… If this has to be true, lot of changes will be required in the architecture. Currently ASM instance starts after CSS (and other clusterware services in RAC) service has been started . But this change will mean that ASM has to start before these processes. Currently if you try to start the ASM instance with CSS service down, you get following error
[oracle@blrraclnx2 ~]$ export ORACLE_SID=+ASM1
[oracle@blrraclnx2 ~]$ sqlplus
SQL*Plus: Release 10.2.0.1.0 - Production on Sun Aug 10 08:16:09 2008
Copyright (c) 1982, 2005, Oracle. All rights reserved.
Enter user-name: / as sysdba
Connected to an idle instance.
SQL> startup
ORA-29701: unable to connect to Cluster Manager
SQL> exit
Disconnected
[oracle@blrraclnx2 ~]$ oerr ora 29701
29701, 00000, "unable to connect to Cluster Manager"
// *Cause: Connect to CM failed or timed out.
// *Action: Verify that the CM was started. If the CM was not started,
// start it and then retry the database startup. If the CM died
// or is not responding, check the Oracle and CM trace files for
// errors.
Apart from changing the architecture, it will also involve lot of effort from Oracle DBA’s to unlearn and learn new concepts 🙂 At the same time, it will help you start the ASM Instance even though CSS is not up!! (I know many people will be having a sigh of relief after reading the last line) Or is there something else in store for us! There are also lot of RAC and ASM features expected in 11gR2. So let’s wait and watch..
This blog reflect our own views and do not necessarily represent the views of our current or previous employers.
The contents of this blog are from our experience, you may use at your own risk, however you are strongly advised to cross reference with Product documentation and test before deploying to production environments.
Recent Comments