Amit Bansal

opmnctl start fails when ORA_NLS10 parameter is set

I got into this issue , where in while starting Grid Control components, opmnctl command failed with following errors

./opmnctl start
globalInitNLS: NLS boot file not found or invalid
 -- default linked-in boot block used
XML parser init: error 201.
globalInitNLS: NLS boot file not found or invalid
 -- default linked-in boot block used
XML parser init: error 201.
globalInitNLS: NLS boot file not found or invalid
 -- default linked-in boot block used
XML parser init: error 201.
opmnctl: opmn start failed

On checking environment variables, I found ORA_NLS10 env variable was set. Unsetting it resolved the issue.

Cluvfy Reports CRS not Installed on Nodes

Before I start my post, I would like to wish all our readers a “Happy New Year”.

While reviewing a 10gR2 RAC configuration I faced following errors on invoking cluvfy utility

$ ./cluvfy stage -post crsinst -n prod01,prod02 -verbose

Performing post-checks for cluster services setup 

Checking node reachability...

Check: Node reachability from node "prod01-fe"
  Destination Node                      Reachable?
  ------------------------------------  ------------------------
  prod01                              yes
  prod02                              yes
Result: Node reachability check passed from node "prod01-fe".

Checking user equivalence...

Check: User equivalence for user "oracle"
  Node Name                             Comment
  ------------------------------------  ------------------------
  prod02                              passed
  prod01                              passed
Result: User equivalence check passed for user "oracle".

ERROR:
CRS is not installed on any of the nodes.
Verification cannot proceed.

Post-check for cluster services setup was unsuccessful on all the nodes.

Cluvfy has reported that Clusterware has not been installed on the server. But this was strange and unexpected as one RAC Database was already running on these nodes and crs_stat reported the status of all CRS resources. ocrcheck also did not report any problems with the OCR. To dig further I checked Cluvfy logs located under $ORA_CRS_HOME/cv/log. It had recorded following errors

[main] [15:52:41:800] [OUIData.readInventoryData:393]  ==== CRS home added: Oracle home properties:
Name     : OraCRS
Type     : CRS-HOME
Location : /app/oracle/product/10.2/crs
Node list: [prod01-fe, prod02-fe]
; Thu Dec 24 15:52:41 GMT+08:00 2009
[main] [15:52:41:800] [OUIData.readInventoryData:401]  ==== ORACLE home added: Oracle home properties:
Name     : OraHome
Type     : ORACLE-HOME
Location : /app/oracle/product/10.2/db_1
Node list: [prod01, prod02]
; Thu Dec 24 15:52:41 GMT+08:00 2009
[main] [15:52:41:800] [VerificationUtil.isCRSInstalled:1262]  CRS wasn't found installed  on node: prod02; Thu Dec 24 15:52:41
 GMT+08:00 2009
[main] [15:52:41:800] [VerificationUtil.isCRSInstalled:1262]  CRS wasn't found installed  on node: prod01; Thu Dec 24 15:52:41
 GMT+08:00 2009

You can notice that CRS_HOME has recorded Node list as [prod01-fe, prod02-fe] and for DB_HOME it is [prod01, prod02]. Actual hostname for the nodes are prod01 and prod02. prod01-fe and prod02-fe were aliases for the two nodes.

Cluvfy uses Central Oracle Inventory to obtain ORACLE_HOME information. Checking the $ORACLE_BASE/oraInventory/ContentsXML/inventory.xml confirmed that the nodes corresponding to CRS_HOME were stored as prod01-fe and prod02-fe ( I have changed the Angled brackets to Round brackets as html considers them as tags. Don’t worry your inventory file is not corrupted 🙂 )

(INVENTORY)
(VERSION_INFO)
   (SAVED_WITH)10.2.0.1.0(/SAVED_WITH)
   (MINIMUM_VER)2.1.0.6.0(/MINIMUM_VER)
(/VERSION_INFO)
(HOME_LIST)
(HOME NAME="OraCRS" LOC="/app/oracle/product/10.2/crs" TYPE="O" IDX="1" CRS="true")
   (NODE_LIST)
      (NODE NAME="prod01-fe"/)
      (NODE NAME="prod02-fe"/)
   (/NODE_LIST)
(/HOME)
(HOME NAME="OraHome" LOC="/app/oracle/product/10.2/db_1" TYPE="O" IDX="2")
   (NODE_LIST)
      (NODE NAME="prod01"/)
      (NODE NAME="prod02"/)
   (/NODE_LIST)
(/HOME)

There are few reported issues on metalink but they relate to CRS=”true” not being present for CRS_HOME entry. There is one more reported issue listed here by Surachart which was caused by incorrect permissions on /etc/oraInst.loc file.

To correct this problem, we had to run runInstaller -updateNodelist command to update the correct nodes in file. Even though some metalink notes recommend to change this file manually,  I would recommend using runInstaller command for updating inventory.
Before executing the command, I ran olsnodes command to confirm that OCR stored prod01,prod02 in its repository.

runInstaller -updateNodeList -silent "CLUSTER_NODES={prod01,prod02}" ORACLE_HOME="/app/oracle/product/10.2/crs" ORACLE_HOME_NAME="OraCRS" LOCAL_NODE="prod01" CRS=true
Starting Oracle Universal Installer...

No pre-requisite checks found in oraparam.ini, no system pre-requisite checks will be executed.
The inventory pointer is located at /var/opt/oracle/oraInst.loc
The inventory is located at /app/oracle/oraInventory
'UpdateNodeList' was successful.

Now the cluvfy worked fine. I do not have any reasoning for why the incorrect nodes were recorded in Oracle Inventory, but above solution should take care of this.

11gR2: Steps to Rename ASM Diskgroup with DB files

11gR2 introduced renamedg utility to rename diskgroups. I would discuss renaming a diskgroup on a Standalone Grid Infrastructure installation with Oracle Restart. Before you use the command, ensure that ASM diskgroup is dismounted.
Below is syntax for the command. You can refer to
11gR2 docs for details

renamedg
[phase={ one|two |both } ] dgname=diskgroup
newdgname=newdiskgroup [config=configfile]
[ asm_diskstring=discoverystring, discoverystring ... ]
[ clean={true|false} ] [ check={true|false} ]
[ confirm={true|false}] [ verbose={ true|false} ]
[ keep_voting_files={true|false}]

Basically we have two phases of using renamedg command, phase 1 generates a configuration file to be used by phase 2. In phase 2 actual renaming of diskgroup is done.

Note : – I have tried these steps in a Test environment and tried to cover the most components. But would recommend you  to cross verify the steps in a test environment before implementing in production environment

Lets get Started!!

1) Gather all the information for Oracle restart Configuration and DB parameters

Database Configuration

[oracle@db11g bin]$ srvctl config database -d test11r2
Database unique name: test11r2
Database name: test11r2
Oracle home: /u02/app/oracle/product/11.2.0/dbhome_1
Oracle user: oracle
Spfile: +DATA/test11r2/spfiletest11r2.ora
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Disk Groups: DATA
Services:

ASM Information

[oracle@db11g bin]$ srvctl config asm
ASM home: /u02/app/oracle/product/11.2.0/grid
ASM listener: LISTENER
Spfile: +DATA/asm/asmparameterfile/registry.253.696489255
ASM diskgroup discovery string: /dev/oracleasm/disks/VOL*

[oracle@db11g bin]$ srvctl config listener
Name: LISTENER
Home: /u02/app/oracle/product/11.2.0/grid
End points: TCP:1521

Create Database parameter file

sql> create pfile='/tmp/param.txt' from spfile;

Oracle Restart configuration

Disable Oracle Restart so that it does not restart any component. ‘crsctl status resource’ can be used to check status of resources. As you can see I have stopped database but  ASM Diskgroup DATA is still mounted.

[oracle@db11g bin]$ crsctl disable has
[oracle@db11g bin]$ crsctl status resource
NAME=ora.DATA.dg
TYPE=ora.diskgroup.type
TARGET=ONLINE
STATE=ONLINE on db11g

NAME=ora.FLASH_ARC.dg
TYPE=ora.diskgroup.type
TARGET=ONLINE
STATE=ONLINE on db11g

NAME=ora.LISTENER.lsnr
TYPE=ora.listener.type
TARGET=ONLINE
STATE=ONLINE on db11g

NAME=ora.asm
TYPE=ora.asm.type
TARGET=ONLINE
STATE=ONLINE on db11g

NAME=ora.cssd
TYPE=ora.cssd.type
TARGET=ONLINE
STATE=ONLINE on db11g

NAME=ora.diskmon
TYPE=ora.diskmon.type
TARGET=ONLINE
STATE=ONLINE on db11g

NAME=ora.test11r2.db
TYPE=ora.database.type
TARGET=OFFLINE
STATE=OFFLINE

2) We are renaming DATA diskgroup to DG.

[oracle@db11g bin]$ renamedg dgname=data newdgname=dg asm_diskstring='/dev/oracleasm/disks/VOL*' verbose=true
NOTE: No asm libraries found in the system

Parsing parameters..

Parameters in effect:

         Old DG name       : DATA
         New DG name          : DG
         Phases               :
                 Phase 1
                 Phase 2
         Discovery str        : /dev/oracleasm/disks/VOL*
         Clean              : TRUE
         Raw only           : TRUE
renamedg operation: dgname=data newdgname=dg asm_diskstring=/dev/oracleasm/disks/VOL* verbose=true
Executing phase 1
Discovering the group
Performing discovery with string:/dev/oracleasm/disks/VOL*
Identified disk UFS:/dev/oracleasm/disks/VOL11 with disk number:2 and timestamp (32924744 398242816)
Identified disk UFS:/dev/oracleasm/disks/VOL12 with disk number:1 and timestamp (32924741 928525312)
Identified disk UFS:/dev/oracleasm/disks/VOL21 with disk number:0 and timestamp (32924744 362846208)
Identified disk UFS:/dev/oracleasm/disks/VOL22 with disk number:3 and timestamp (32924741 928525312)
Checking for hearbeat...
Re-discovering the group
Performing discovery with string:/dev/oracleasm/disks/VOL*
Identified disk UFS:/dev/oracleasm/disks/VOL11 with disk number:2 and timestamp (32924744 398242816)
Identified disk UFS:/dev/oracleasm/disks/VOL12 with disk number:1 and timestamp (32924741 928525312)
Identified disk UFS:/dev/oracleasm/disks/VOL21 with disk number:0 and timestamp (32924744 362846208)
Identified disk UFS:/dev/oracleasm/disks/VOL22 with disk number:3 and timestamp (32924741 928525312)
Checking if the diskgroup is mounted
Checking disk number:2
Checking disk number:1
Checking disk number:0
Checking disk number:3
Checking if diskgroup is used by CSS
Generating configuration file..
Completed phase 1
Executing phase 2
Looking for /dev/oracleasm/disks/VOL11
Modifying the header
Looking for /dev/oracleasm/disks/VOL12
Modifying the header
Looking for /dev/oracleasm/disks/VOL21
Modifying the header
Looking for /dev/oracleasm/disks/VOL22
Modifying the header
Completed phase 2
Terminating kgfd context 0xb7e7a050

3) Modify ASM configuration to correct the spfile location

$srvctl modify asm -p +DG/asm/asmparameterfile/registry.253.696489255
$srvctl start asm

Connect to ASM instance and mount +DG diskgroup.

sqlplus "/ as sysasm"
sql> alter diskgroup DG mount;

At this moment, you will see additional ASM diskgroup resource ora.DG.dg in crsctl configuration. Remove the old ora.DATA.dg resource

oracle@db11g ~]$ crsctl delete resource ora.DATA.dg
CRS-2730: Resource 'ora.test11r2.db' depends on resource 'ora.DATA.dg'
CRS-4000: Command Delete failed, or completed with errors.

We get above error as the test11r2 database configuration states ora.DATA.dg as dependent resource. Use -f option to drop it by force. Or you can first modify the database configuration and drop the resource later.

[oracle@db11g ~]$ crsctl delete resource ora.DATA.dg -f

4) Startup Database.

To do that we need to modify parameter file as control_files parameter and other *_file_dest needs to be changed. Change all occurence of ‘+DATA’ to ‘+DG’. Open database in mount mode as we would have to rename database files and redo logs before opening databases

SQL> startup nomount pfile='/tmp/param.txt'
ORACLE instance started.

Total System Global Area  343154688 bytes
Fixed Size                  1336428 bytes
Variable Size             218106772 bytes
Database Buffers          117440512 bytes
Redo Buffers                6270976 bytes

SQL>alter database mount;

Use below command to get sql for renaming diskgroup information for datafiles/redo logs

 select 'alter database rename file '''||name||''' to ''<strong>+DG</strong>'||substr(name,instr(name,'/',1,1))||''';' from V$DATAFILE;

select 'alter database rename file '''||member||''' to ''<strong>+DG</strong>'||substr(member,instr(member,'/',1,1))||''';' from V$logfile;

V$RECOVER_FILE view can be used to check for any issues with datafiles not being available.
If everything is fine, you can open the database. You would have noticed that I have not done anything for Temp files. Checking alert log, we see that it is renamed automatically

 Re-creating tempfile +DATA/test11r2/tempfile/temp.268.696494395 as +DG/test11r2/tempfile/temp.272.704195475

Now we need to re-create spfile on ASM Diskgroup. Use following steps

$cd $ORACLE_HOME/dbs
$cat inittest11r2.ora
SPFILE='+DATA/test11r2/spfiletest11r2.ora'
$mv inittest11r2.ora inittest11r2.ora.new
$ cp /tmp/param.txt $ORACLE_HOME/dbs/inittest11r2.ora
$sqlplus "/ as sysdba"
SQL>create spfile='+DG/test11r2/spfiletest11r2.ora' from pfile;
exit
$mv inittest11r2.ora initest11r2.old

--- Modify inittest11r2.ora.new and change diskgroup to +DG---
$cat inittest11r2.ora.new
SPFILE='+DG/test11r2/spfiletest11r2.ora'
$mv inittest11r2.ora.new inittest11r2.ora

Now we need to modify Database configuration for Oracle Restart

[oracle@db11g ~]$ srvctl config database -d test11r2
Database unique name: test11r2
Database name: test11r2
Oracle home: /u02/app/oracle/product/11.2.0/dbhome_1
Oracle user: oracle
Spfile: +DATA/test11r2/spfiletest11r2.ora
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
PRCD-1012 : Failed to retrieve disk group list for database test11r2.
PRCR-1035 : Failed to look up CRS resource ora.DATA.dg for test11r2
PRCR-1001 : Resource ora.DATA.dg does not exist

As there is no DATA diskgroup available, it gives a error. If you notice initial output (before renaming diskgroup), there was entry for DATA Diskgroup. We also need to modify the entry for spfile.

[oracle@db11g ~]$ srvctl modify database -d test11r2 -p +DG/test11r2/spfiletest11r2.ora
[oracle@db11g ~]$ srvctl modify database -d test11r2 -a DG
[oracle@db11g dbs]$ srvctl config database -d test11r2
Database unique name: test11r2
Database name: test11r2
Oracle home: /u02/app/oracle/product/11.2.0/dbhome_1
Oracle user: oracle
<strong>Spfile: +DG/test11r2/spfiletest11r2.ora</strong>
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
<strong>Disk Groups: DG</strong>
Services:

You can now shutdown database and use srvctl to start If you have not dropped ora.DATA.dg resource earlier (due to errors), you can drop it now without errors

 [oracle@db11g ~]$ crsctl delete resource ora.DATA.dg

You can now enable Oracle Restart

[oracle@db11g dbs]$ crsctl enable has
CRS-4622: Oracle High Availability Services autostart is enabled.

Phew!! This was quite a posting after long time!!

11gR2 – SysAsm vs SysDba

SYSASM role was introduced in 11gR1 and was designed to administer ASM instances. In 11gR1 , if you connected with SYSDBA role , you used to get a warning which was only recorded in alert log (Refer to my earlier post here). But things have changed in 11gR2. While trying to dismount a Diskgroup, I found following errors

SQL> alter diskgroup flash_arc mount;
alter diskgroup flash_arc mount
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15260: permission denied on ASM disk group

Above error indicates that I do not have permission on the ASM Diskgroup.  As per 11gR2 documentation, SYSASM privilege is used for carrying out administration tasks on ASM Diskgroups. SYSDBA privilege can be used only for creating/deleting aliases and querying ASM dictionary views.  Frankly speaking, you should unlearn the habit of connecting as “/ as sysdba” to ASM instance and learn connecting as “/ as sysasm”

ORA-27146 Errors while starting up Database

Short post on a issue faced by me, while starting up a cloned database

SQL> startup nomount
ORA-24323: value not allowed
ORA-27146: post/wait initialization failed

This database environment was cloned and was being started for first time after cloning. Running hcve scripts on system indicated that OS kernel parameters were adequately sized.

SHMMNI set to 1000 is adequate
SHMSEG set to 120 is adequate
SEMMNI set to 700 is adequate
SEMMNS set to 2000 is adequate
MAX_THREAD_PROC set to 256 is adequate
NFLOCKS set to 1450 is adequate

My Oracle Support articles suggested this issue is related to semaphores setting. As settings were correct, I attempted to start instance by commenting processes parameter in init.ora (setting was 1000). This time database started.

I modified the processes parameter back to 1000 and attempted to restart the Database. It started  again!! I am clueless on why it initially failed and then started with same setting. Does anyone have any reasoning for this behavior?

Don’t blindly implement Blog suggestions on your Production Servers

Don’t blindly implement Blog suggestions on your Production Servers. This post is prompted by a recent issue where in DBA had modified racgvip script based on suggestion on one of blog (won’t name it).  Any such suggestions or similar to these which involve modification of oracle provided scripts should always be verified with Oracle Support.  Anyways I have put comment on the blog entry to correct or put a disclaimer 🙂