10.2 CRS startup issue

Today I faced a strange issue with CRS  post host reboot. CRS was not coming up and we could see following message in $ORA_CRS_HOME/log/<hostname>/client/clsc*.log

cat clsc26.log
Oracle Database 10g CRS Release 10.2.0.4.0 Production Copyright 1996, 2008 Oracle.  All rights reserved.
2011-07-01 21:00:14.345: [ COMMCRS][2541577376]clsc_connect: (0x6945e0) no listener at (ADDRESS=(PROTOCOL=IPC)(KEY=CRSD_UI_SOCKET))

2011-07-01 21:00:14.345: [ COMMCRS][2541577376]clsc_connect: (0x695020) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth))

It looked like like issue with socket files, so I removed /var/tmp/.oracle files (This is RHEL4 box). Tried starting crs with ‘crsctl start crs’ and still no socket files were written. /tmp/crsctl*log files were getting generated but they were empty. I spent close to 1 hour rebooting host and trying various stuff. Then I decided to run the daemons mentioned in /etc/inittab  manually i.e

/etc/init.d/init.evmd run
/etc/init.d/init.cssd fatal
/etc/init.d/init.crsd run

When I ran init.evmd I got following errors

# /etc/init.d/init.evmd run
Startup will be queued to init within 30 seconds.
/home/oracle/.bash_profile: line 6: ulimit: open files: cannot modify limit: Operation not permitted
*** glibc detected *** double free or corruption (fasttop): 0x0000000000688960 ***
-bash: line 1: 17389 Aborted                 /apps/oracle/product/102crs/bin/crsctl check boot >/tmp/crsctl.17085

It pointed to issue with .bash_profile so I renamed it to .old and retried the operation. This time it succeeded and crs also came up fine.

There was entry for ulimit -n 2048 in .bash_profile which was causing it. I am not aware why ulimit is causing issue, will try to find it and post details

ORA-00132 while starting 11gR2 database

Short post on a error which can take lot of time to debug. You have upgraded a database and after restoring all files (init,tnsnames) try to startup 11g database and it fails with following error

SQL> startup nomount
ORA-00119: invalid specification for system parameter REMOTE_LISTENER
ORA-00132: syntax error or unresolved network name 'scan-clu:1521'

11gR2 requires remote_listener to be set to <scan_name:port>. Problem is that the sqlnet.ora file has been modified/replaced (from earlier version backup) which does not have EZCONNECT and has only TNSNAMES. Ensure that your sqlnet.ora file contain’s EZCONNECT

NAMES.DIRECTORY_PATH= (TNSNAMES, EZCONNECT)

Adding a reinstalled/reimaged node back to 11gR2 Cluster

There could be a scenario of node crash due to OS/hardware issues and a reinstall/reimage of the same. In such cases, just a normal node addition would not  help since the OCR still contain the references of original node. We need to remove them first and then perform a node addition.

Have tried to document one such usecase.

Assumptions :
Cluster Hostnames : node1 , node2
VIP : node1-v , node2-v

– Voting disk and OCR are on ASM( ASMLIB is being used to manage the shared disks )
– After the OS reinstall, user equivalence has been set and all required packages have been installed along with setup of ASMLIB
– The crashed node was node2

STEPS
———-
1. Clearing the OCR entries for re-imaged host.

# crsctl delete node -n node2

To verify the success of above step, execute “olsnodes” on surviving node and the reimaged host shouldnot show up in list.

2. Remove the VIP information of reimaged host from OCR

Execute the following on existing node :
	/u01/grid/11.2/bin/srvctl remove vip -i node2-v -f

3. Clear the inventory for reimaged host for GI and DB Homes.

From the surviving node, execute :

/u01/grid/11.2/oui/bin/runInstaller -updateNodeList ORACLE_HOME=/u01/grid/11.2 "CLUSTER_NODES=node1" CRS=TRUE -silent -local

Perform the similar for Database Home as well :

/u01/oracle/product/11.2/oui/bin/runInstaller -updateNodeList ORACLE_HOME=/u01/oracle/product/11.2 CLUSTER_NODES=node1 -silent -local

4. Now starts the actual step of adding node. Run the Run the Cluster Verification Utility

./cluvfy  stage -pre nodeadd -n node2 -verbose

If possible, redirect the output of above to some file so that it can be reviewed and any issues reported can be rectified.

For this case, since the OCR and Voting disk resides on ASM and ASMLIB is in use, the most impacting errors were

ERROR:
PRVF-5449 : Check of Voting Disk location “ORCL:DISK6(ORCL:DISK6)” failed on the following nodes:
node2:No such file or directory

PRVF-5431 : Oracle Cluster Voting Disk configuration check failed

Will explain the impact of this error in the subsequent steps..

5. Run “addNode.sh” from existing node.

[oracle@node1] /u01/grid/11.2/oui/bin% ./addNode.sh -silent "CLUSTER_NEW_NODES={node2}" "CLUSTER_NEW_VIRTUAL_HOSTNAMES={node2-v}"
[oracle@node1] /u01/grid/11.2/oui/bin%

In my case, the above command came out without giving any messages. Actually the addNode.sh didnot run at all.

Cause : Since ASMLIB is in use, we had hit the issue discussed in MOS Note : 1267569.1
The error seen in step 4 helped in finding this.

Solution :

Set the following parameters and run addNode.sh again.

IGNORE_PREADDNODE_CHECKS=Y
export IGNORE_PREADDNODE_CHECKS

[oracle@node1] /u01/grid/11.2/oui/bin% ./addNode.sh -silent "CLUSTER_NEW_NODES={node2}" "CLUSTER_NEW_VIRTUAL_HOSTNAMES={node2-v}"
Starting Oracle Universal Installer...

Checking swap space: must be greater than 500 MB.   Actual 12143 MB

Performing tests to see whether nodes node2 are available
............................................................... 100% Done.

Cluster Node Addition Summary
Global Settings
   Source: /u01/grid/11.2
   New Nodes
Space Requirements
   New Nodes
      node2

	Instantiating scripts for add node (Tuesday, December 21, 2010 3:35:16 AM PST)
			.                                                                 1% Done.
			Instantiation of add node scripts complete

			Copying to remote nodes (Tuesday, December 21, 2010 3:35:18 AM PST)
			...............................................................................................                                 96% Done.
			Home copied to new nodes

			Saving inventory on nodes (Tuesday, December 21, 2010 3:37:57 AM PST)
			.                                                               100% Done.
			Save inventory complete
			WARNING:
			The following configuration scripts need to be executed as the "root" user in each cluster node.
			/u01/grid/11.2/root.sh # On nodes node2
			To execute the configuration scripts:
			    1. Open a terminal window
			    2. Log in as "root"
			    3. Run the scripts in each cluster node

			The Cluster Node Addition of /u01/grid/11.2 was successful.
			Please check '/tmp/silentInstall.log' for more details.

6. Run root.sh on reimaged node to start up CRS stack.

This will completed Grid Infrastucture setup on the node.

7. Proceed to run addNode.sh for DB Home( on existing Node)

/u01/oracle/product/11.2/addNode.sh -silent "CLUSTER_NEW_NODES={node2}"

8. Once the DB Home addition is complete, use srvctl to check the status of registered DB and instances and add them if required.

Performance Management Guide on AIX

While trying to find the amount of physical memory used by oracle process on AIX, I got reference of a document from Metalink:

Performance Management Guide

It tell us about which process is using how much memory and how to interpret the output of commands like: vmstat, svmon, ps on AIX.

Also, to get more information on AIX parameter like: MAXPERM, MINPERM click here

Though I have not explored the complete guide yet, but found it very good to start with.

Retrieving Database SID,Port information from Grid Control repository

This is short posting on sql which can be used to get Hotsname,SID,Port information for databases registered in Grid Control repository. This information can be  used to create a tns entries and we can further use it to run a sql on all these databases.

set pages 999 lines 200
col host for a50
col port for a10
col sid for a10

select
distinct mgmt$target.host_name||'|'||sid.PROPERTY_VALUE||'|'||port.PROPERTY_VALUE
from
mgmt_target_properties machine,
mgmt_target_properties port,
mgmt_target_properties sid,
mgmt_target_properties domain,
mgmt$target
where
machine.target_guid=sid.target_guid
AND sid.target_guid=port.target_guid
AND port.target_guid=domain.target_guid
AND machine.PROPERTY_NAME='MachineName'
AND port.PROPERTY_NAME='Port'
AND sid.PROPERTY_NAME='SID'
AND sid.PROPERTY_VALUE not like '%ASM%'
AND machine.TARGET_GUID in (select TARGET_GUID from mgmt_current_availability where EM_SEVERITY.get_avail_string(current_status)='UP')
AND machine.TARGET_GUID=mgmt$target.target_guid
order by 1;

Pasting a small shell script, which can be used to create tnsnames.ora

cat db_list.txt |grep -v "^$"| while read each_line
do
        HOST_NAME=`echo $each_line |cut -d"|" -f1`
        ORACLE_SID=`echo $each_line |cut -d"|" -f2`
        PORT=`echo $each_line |cut -d"|" -f3`

echo "${ORACLE_SID}.world ="                    >> tnsnames.ora
echo "  (DESCRIPTION ="                         >> tnsnames.ora
echo "    (ADDRESS = (PROTOCOL = TCP)"          >> tnsnames.ora
echo "     (HOST = ${HOST_NAME})(PORT = ${PORT}))" >> tnsnames.ora
echo "    (CONNECT_DATA = "                     >> tnsnames.ora
echo "     (SID = ${ORACLE_SID})"               >> tnsnames.ora
echo "    )"                                    >> tnsnames.ora
echo "  )"                                      >> tnsnames.ora
echo " "                                        >> tnsnames.ora

done

ORA-01722 with Full Table Scan

My application developers approached me with an issue which is very unique to me. They were complaining about a query which was failing with ORA-01722 “invalid number” after an upgrade to 11.1.0.7 from 10.2.0.4. The syntax of the query is like:

select max(a) from t1 where c1<>'abc' and c2=12345 and c3='Y' and c4='xyz';

This query worked fine in 10204 and was also working fine in another, upgraded, 11.1.0.7 database.

All the columns i.e C1,C2,C3 & C4 are varchar 2(20) .

I ran this query with single quotes around column C2 as:

select max(a) from t1 where c1<>'abc' and c2='12345' and c3='Y' and c4='xyz';

and it worked fine but without single quotes it failed again with same error.

I checked the explain plan of the query and it was doing a “Full Table Scan” on Table T1. Then I opened another 11.1.0.7 database where the same query is working fine and found that there is an index on columns C1,C2,C3 & C4 and the table T1 was getting accessed by Index-Range scan.

Now coming back to the failing 11.1.0.7 database, index on column C4 was missing. After creating index on column C4 the query started to work fine at failing instance.

I am not sure how the absence of an index can cause this issue? Why VARCHAR2 cannot recognize a value without quotes when doing a Full Table Scan?

Your comments are always welcome. Please let us know your views on this.