Amit Bansal

CRS-4640 Error on Starting 11gR2 clusterware

I was working on a issue where in Clusterware was not coming up as private interface was down. Following errors were recorded in ocssd.log which informed that issue was with private interface

2011-08-31 15:03:38.051: [ CSSD][1090451776]clssnmvDHBValidateNCopy: node 2, testrac2, has a disk HB, but no network HB, DHB has rcfg 205815745, wrtcnt, 4418998, LATS 4634324, lastSeqNo 4418997, uniqueness 1314797539, timestamp 1314803017/4632384

Checking status of crs informed that the OHASD process was up and running but CRS,CSSD and EVMD processes were not running.

[root@testrac1 cssd]# /oragrid/product/11.2/bin/crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager

After fixing the interface issue, we tried starting CRS with ‘crsctl start crs‘ command and it failed with following errors

[root@testrac1 cssd]# /oragrid/product/11.2/bin/crsctl start crs
CRS-4640: Oracle High Availability Services is already active
CRS-4000: Command Start failed, or completed with errors.

CRS-4640 is reported since OHASD is already running. In 11.2 OHASD is supposed to start the other dependent processes.

crsctl stop crs command failed

[root@testrac1 cssd]# /oragrid/product/11.2/bin/crsctl stop crs
CRS-2796: The command may not proceed when Cluster Ready Services is not running
CRS-4687: Shutdown command has completed with errors.
CRS-4000: Command Stop failed, or completed with errors.

Since ohasd was already running, I tried crsctl start cluster (this command requires ohasd to be up), and this command succeeded

[root@testrac1 cssd]# /oragrid/product/11.2/bin/crsctl start cluster
CRS-2672: Attempting to start 'ora.cssd' on 'testrac1'
CRS-2676: Start of 'ora.cssd' on 'testrac1' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'testrac1'
CRS-2672: Attempting to start 'ora.ctssd' on 'testrac1'
CRS-2676: Start of 'ora.ctssd' on 'testrac1' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'testrac1'
CRS-2672: Attempting to start 'ora.evmd' on 'testrac1'
CRS-2676: Start of 'ora.crsd' on 'testrac1' succeeded
CRS-5702: Resource 'ora.crsd' is already running on 'testrac1'
CRS-2676: Start of 'ora.evmd' on 'testrac1' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'testrac1' succeeded
CRS-5702: Resource 'ora.cluster_interconnect.haip' is already running on 'testrac1'
CRS-4000: Command Start failed, or completed with errors.

[root@testrac1 ~]# /oragrid/product/11.2/bin/crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

Ideally crsctl start crs should be used to start the Clusterware components. But in case they fail to come up due to some issue (e.g voting disk inaccessible,interface issue) and you are in situation when ohasd is up then you can use crsctl start cluster to start the remaining clusterware processes after fixing underlying issue.I believe crsctl stop crs -f option can also be used, though I didn’t try it for this issue.

11gR2:Enable and Disable Oracle Feature with Chopt

Oracle has introduced a utility called Chopt in 11gR2 to enable/disable few database features after database installation. To perform this, you need to shut down database and run the utilty present under $ORACLE_HOME/bin. Find below list of options allowed

Value Description
dm Oracle Data Mining Database Files
dv Oracle Database Vault
lbac Oracle Label Security
olap Oracle OLAP
partitioning Oracle Partitioning
rat Oracle Real Application Testing
ode_net Oracle Database Extensions for .NET 1.x
ode_net_2 Oracle Database Extensions for .NET 2.0

e.g To enable Database Vault, you need to issue following command

$chopt enable dv

As of now no option to enable/disable RAC option. It would still be enabled/disabled using (make -f ins_rdbms.mk rac_off ioracle)

You can find documentation link here

10gR2 Silent Install with 11gr2 CRS fails

I was trying to perform a 10.2 silent install with 11gR2 CRS. While doing pre-checks installer failed with following error

Check complete: Failed <<<<
Problem: The 'active' version of Oracle Clusterware is not 10g Release 2 (10.2).
Recommendation: You must upgrade all nodes of the cluster to Oracle Clusterware 10g Release 2.  If you have upgraded some but not all of the nodes to use the 10g Release 2 version of Oracle Clusterware, then the 'active' version is still 10g Release 1 (10.1)  You must upgrade all nodes in the cluster to Oracle Clusterware 10g Release 2 before installing Oracle 10g Release 2 Real Application Clusters.

I tried “ignoreSysPrereqs” option with runInstaller but it also did not succeed. I checked My Oracle Support (formerly metalink..anyways I still refer to as metalink) and also searched for any known issues, but couldn’t find any document. I could find some issues on OTN but there was no solution. Finally I searched for the file reporting this error in Oracle software staging location.

$% grep -r "version of Oracle Clusterware is not 10g Release 2" *
stage/prereq/db/db_prereq.xml:

This was part of following code( I have removed Angle brackets with Square brackets as wordpress confuses it with html tags)

[PREREQUISITE NAME="Detect10.2CRS"
                EXTERNALNAME="Checking Oracle Clusterware version ..."
                EXTERNALNAMEID="[email protected]"
                SEVERITY="Error"]
        [DESCRIPTION TEXT="This is a prerequisite condition to test if all nodes in the cluster have had the Clusterware upgraded to 10g Release 2 (10.2)."
                TEXTID="S_CHECK_10.2_CRS_DESCRIPTION@oracle.install.prereqs.resources.PrereqRes"/]
        [RULESETREF NAME="CRS102Checks" RULE="CheckFor102CRS" FILE="db/refhost.xml"
                RESULTS_FILE="install_rule_results.xml"/]
        [PROBLEM TEXT="The 'active' version of Oracle Clusterware is not 10g Release 2 (10.2)."
                TEXTID="S_CHECK_10.2_CRS_ERROR@oracle.install.prereqs.resources.PrereqRes"]
        [/PROBLEM]

Checking “Detect10.2CRS” in My Oracle Support, got exact hit

Silent Install 10.2.0.1 Database Fails When Cluster Is 11.1.0.6 [ID 755345.1]

As per note, we need to change the following lines in (software location)\stage\prereq\db\db_prereq.xml file ( I have removed Angle brackets with Square brackets as wordpress confuses it with html tags)

[PREREQUISITESET NAME="clusterTests"]
[PREREQUISITEREF NAME="Detect10.2CRS" SEVERITY="Error"/]
[/PREREQUISITESET]

to :

[PREREQUISITESET NAME="clusterTests"]
[/PREREQUISITESET]

You would be required to do same change for similar file to any 10g patchset on top of it. In case of 10.2.0.4 patch I found it under (software_location)/stage/prereq/patch_prereqs.xml
Searching on the error messages in My Oracle Support did not return above document. Anyways documenting it so that Search engines can report it faster. Note that to use 10g DB software with 11gR2 CRS, you will have to pin the nodes

$GRID_HOME/bin/crsctl pin css -n node1 node2

olsnodes -t will report current status of the nodes i.e whether pinned or not

10.2 CRS startup issue

Today I faced a strange issue with CRS  post host reboot. CRS was not coming up and we could see following message in $ORA_CRS_HOME/log/<hostname>/client/clsc*.log

cat clsc26.log
Oracle Database 10g CRS Release 10.2.0.4.0 Production Copyright 1996, 2008 Oracle.  All rights reserved.
2011-07-01 21:00:14.345: [ COMMCRS][2541577376]clsc_connect: (0x6945e0) no listener at (ADDRESS=(PROTOCOL=IPC)(KEY=CRSD_UI_SOCKET))

2011-07-01 21:00:14.345: [ COMMCRS][2541577376]clsc_connect: (0x695020) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth))

It looked like like issue with socket files, so I removed /var/tmp/.oracle files (This is RHEL4 box). Tried starting crs with ‘crsctl start crs’ and still no socket files were written. /tmp/crsctl*log files were getting generated but they were empty. I spent close to 1 hour rebooting host and trying various stuff. Then I decided to run the daemons mentioned in /etc/inittab  manually i.e

/etc/init.d/init.evmd run
/etc/init.d/init.cssd fatal
/etc/init.d/init.crsd run

When I ran init.evmd I got following errors

# /etc/init.d/init.evmd run
Startup will be queued to init within 30 seconds.
/home/oracle/.bash_profile: line 6: ulimit: open files: cannot modify limit: Operation not permitted
*** glibc detected *** double free or corruption (fasttop): 0x0000000000688960 ***
-bash: line 1: 17389 Aborted                 /apps/oracle/product/102crs/bin/crsctl check boot >/tmp/crsctl.17085

It pointed to issue with .bash_profile so I renamed it to .old and retried the operation. This time it succeeded and crs also came up fine.

There was entry for ulimit -n 2048 in .bash_profile which was causing it. I am not aware why ulimit is causing issue, will try to find it and post details

ORA-00132 while starting 11gR2 database

Short post on a error which can take lot of time to debug. You have upgraded a database and after restoring all files (init,tnsnames) try to startup 11g database and it fails with following error

SQL> startup nomount
ORA-00119: invalid specification for system parameter REMOTE_LISTENER
ORA-00132: syntax error or unresolved network name 'scan-clu:1521'

11gR2 requires remote_listener to be set to <scan_name:port>. Problem is that the sqlnet.ora file has been modified/replaced (from earlier version backup) which does not have EZCONNECT and has only TNSNAMES. Ensure that your sqlnet.ora file contain’s EZCONNECT

NAMES.DIRECTORY_PATH= (TNSNAMES, EZCONNECT)

Retrieving Database SID,Port information from Grid Control repository

This is short posting on sql which can be used to get Hotsname,SID,Port information for databases registered in Grid Control repository. This information can be  used to create a tns entries and we can further use it to run a sql on all these databases.

set pages 999 lines 200
col host for a50
col port for a10
col sid for a10

select
distinct mgmt$target.host_name||'|'||sid.PROPERTY_VALUE||'|'||port.PROPERTY_VALUE
from
mgmt_target_properties machine,
mgmt_target_properties port,
mgmt_target_properties sid,
mgmt_target_properties domain,
mgmt$target
where
machine.target_guid=sid.target_guid
AND sid.target_guid=port.target_guid
AND port.target_guid=domain.target_guid
AND machine.PROPERTY_NAME='MachineName'
AND port.PROPERTY_NAME='Port'
AND sid.PROPERTY_NAME='SID'
AND sid.PROPERTY_VALUE not like '%ASM%'
AND machine.TARGET_GUID in (select TARGET_GUID from mgmt_current_availability where EM_SEVERITY.get_avail_string(current_status)='UP')
AND machine.TARGET_GUID=mgmt$target.target_guid
order by 1;

Pasting a small shell script, which can be used to create tnsnames.ora

cat db_list.txt |grep -v "^$"| while read each_line
do
        HOST_NAME=`echo $each_line |cut -d"|" -f1`
        ORACLE_SID=`echo $each_line |cut -d"|" -f2`
        PORT=`echo $each_line |cut -d"|" -f3`

echo "${ORACLE_SID}.world ="                    >> tnsnames.ora
echo "  (DESCRIPTION ="                         >> tnsnames.ora
echo "    (ADDRESS = (PROTOCOL = TCP)"          >> tnsnames.ora
echo "     (HOST = ${HOST_NAME})(PORT = ${PORT}))" >> tnsnames.ora
echo "    (CONNECT_DATA = "                     >> tnsnames.ora
echo "     (SID = ${ORACLE_SID})"               >> tnsnames.ora
echo "    )"                                    >> tnsnames.ora
echo "  )"                                      >> tnsnames.ora
echo " "                                        >> tnsnames.ora

done