10g

Upgrading Oracle RAC Database -10g

Continuing my experiments with our 2 Node 10g RAC Test system, I carried out upgrade of Oracle Clusterware and Oracle RAC Database from 10.2.0.1 to 10.2.0.4. I have tried to document the steps for upgrading Oracle Clusterware(Rolling Upgrade) and RAC Database upgrade in this post. In case you observe any mistakes, please let me know

First step is to download the 10.2.0.4 Patchset from metalink. In our case ,we downloaded Patch 6810189 (10g Release 2 (10.2.0.4) Patch Set 3 for Linux x86). You can follow Patch Readme for detailed steps.

We will be doing Rolling upgrade for Oracle Clusterware i.e we will only bring one node down for patching while other node will be available and accepting database connections. Before you start the process, take backup of following so as to restore it in case of failed upgrade

a) Full OS backup (as some binaries are present in /etc ,etc)

b) Full Database Backup (Cold or hot backup)

c) Backup of OCR and voting disk

Let’s begin it

1)Shutdown the DBconsole and Isqlplus

<span style="font-size: small; font-family: arial,helvetica,sans-serif;">$ emctl stop dbconsole
$ isqlplusctl stop
</span>

2) Shutdown the associated service on the node

<span style="font-size: small; font-family: arial,helvetica,sans-serif;">[oracle@blrraclnx1 ~]$ srvctl stop service -d orcl -s orcl_taf -i orcl1</span>

3) Shutdown Database Instance and ASM instance on node (if present)

<span style="font-size: small; font-family: arial,helvetica,sans-serif;">[oracle@blrraclnx1 ~]$  srvctl stop instance -d orcl -i orcl1
</span>

To stop ASM, use following command

<span style="font-size: small; font-family: arial,helvetica,sans-serif;">[oracle@blrraclnx1 ~]$ srvctl stop asm -n blrraclnx1
</span>

4)Next step is to stop Nodeapps services on the node

<span style="font-size: small; font-family: arial,helvetica,sans-serif;">[oracle@blrraclnx1 ~]$ srvctl stop nodeapps -n blrraclnx1</span>

Before proceeding to installing Oracle Clusterware Patch, let’s confirm if services have been stopped

HA Resource                                   Target     State
-----------                                   ------     -----
<strong>ora.blrraclnx1.ASM1.asm                       OFFLINE    OFFLINE
ora.blrraclnx1.LISTENER1_BLRRACLNX1.lsnr      OFFLINE    OFFLINE
ora.blrraclnx1.gsd                            OFFLINE    OFFLINE
ora.blrraclnx1.ons                            OFFLINE    OFFLINE
ora.blrraclnx1.vip                            OFFLINE    OFFLINE</strong>
ora.blrraclnx2.ASM2.asm                       ONLINE     ONLINE on blrraclnx2
ora.blrraclnx2.LISTENER1_BLRRACLNX2.lsnr      ONLINE     ONLINE on blrraclnx2
ora.blrraclnx2.gsd                            ONLINE     ONLINE on blrraclnx2
ora.blrraclnx2.ons                            ONLINE     ONLINE on blrraclnx2
ora.blrraclnx2.vip                            ONLINE     ONLINE on blrraclnx2
ora.orcl.db                                   ONLINE     ONLINE on blrraclnx2
<strong>ora.orcl.orcl1.inst                           OFFLINE    OFFLINE</strong>
ora.orcl.orcl2.inst                           ONLINE     ONLINE on blrraclnx2
ora.orcl.orcl_taf.cs                          ONLINE     ONLINE on blrraclnx2
<strong>ora.orcl.orcl_taf.orcl1.srv                   OFFLINE    OFFLINE</strong>
ora.orcl.orcl_taf.orcl2.srv                   ONLINE     ONLINE on blrraclnx2

5)Set DISPLAY variable and execute runinstaller from Patch Directory

<span style="font-size: small; font-family: arial,helvetica,sans-serif;">[oracle@blrraclnx1 Disk1]$ ./runInstaller
</span>

This will open OUI screen. Select Oracle Clusterware Home for Patching. Find below screenshot for same

crs10204patch

crs10204patch

This will automatically select all the nodes available in cluster and propogate patch binaries to the other node.

10204patch2

10204patch2


6) On the Summary screen, click Install.OUI will prompt you now to run, following two scripts as Root which will upgrade Oracle Clusterware

<span style="font-size: small; font-family: arial,helvetica,sans-serif;"># $ORA_CRS_home/bin/crsctl stop crs
# $ORA_CRS_home/install/root102.sh
</span>

Now we need to repeat the steps 1-4 and step 6 on Node 2. Step 5 is not required as binaries have been already copied over to node 2.

RAC Database Patching cannot be done in a rolling fashion and requires Database to be shutdown.

1)Shutdown the DBconsole and Isqlplus

<span style="font-size: small; font-family: arial,helvetica,sans-serif;">$ emctl stop dbconsole
$ isqlplusctl stop

</span>

2) Shutdown the associated service for database

<span style="font-size: small; font-family: arial,helvetica,sans-serif;">[oracle@blrraclnx1 ~]$ srvctl stop service -d orcl </span>

3) Shutdown Database Instance and ASM instance on node (if present)

<span style="font-size: small; font-family: arial,helvetica,sans-serif;">[oracle@blrraclnx1 ~]$  srvctl stop database -d orcl
</span>

To stop ASM, use following command on both the nodes

&lt;span style=&quot;font-size: small; font-family: arial,helvetica,sans-serif;&quot;&gt;[oracle@blrraclnx1 ~]$ srvctl stop asm -n blrraclnx1
&lt;/span&gt;<span style=\"font-size: small; font-family: arial,helvetica,sans-serif;\">[oracle@blrraclnx1 ~]$ srvctl stop asm -n blrraclnx2</span>

4)Next step is to stop Listener on both the nodes

<span style="font-size: small; font-family: arial,helvetica,sans-serif;">[oracle@blrraclnx1 ~]$ srvctl stop listener -n blrraclnx1 -l LISTENER1_BLRRACLNX1
</span><span style="font-size: small; font-family: arial,helvetica,sans-serif;">[oracle@blrraclnx1 ~]$ srvctl stop </span><span style="font-size: small; font-family: arial,helvetica,sans-serif;">listener</span><span style="font-size: small; font-family: arial,helvetica,sans-serif;"> -n blrraclnx2 -l LISTENER1_BLRRACLNX2
</span>

5)Set DISPLAY variable and execute runinstaller from Patch Directory

<span style="font-size: small; font-family: arial,helvetica,sans-serif;">[oracle@blrraclnx1 Disk1]$ ./runInstaller
</span>

This will open OUI screen. Select Database Home for Patching.


6) On the Summary screen, click Install.When prompted, run the $ORACLE_HOME/root.sh script as the root
user on both the nodes. On completion of this , we need to perform post installation steps.

7)Start listener and ASM Instance on both the nodes

<span style="font-size: small; font-family: arial,helvetica,sans-serif;">[oracle@blrraclnx1 ~]$ srvctl start listener -n blrraclnx1 -l LISTENER1_BLRRACLNX1
[oracle@blrraclnx1 ~]$ srvctl start listener -n blrraclnx2 -l LISTENER1_BLRRACLNX2
[oracle@blrraclnx1 ~]$ srvctl start asm -n blrraclnx1
[oracle@blrraclnx1 ~]$ srvctl start asm -n blrraclnx2</span>

8)For Oracle RAC Installation, we need to set CLUSTER_DATABASE=FALSE before upgrading

<span style="font-size: small; font-family: arial,helvetica,sans-serif;">[oracle@blrraclnx1 ~]sqlplus "/ as sysdba"
SQL&gt;startup nomount
SQL&gt; alter system set cluster_database=false scope=spfile;

System altered.
SQL&gt;shutdown immediate;
SQL&gt;startup upgrade
SQL&gt;spool 10204patch.log
SQL&gt;@?/rdbms/admin/catupgrd.sql
SQL&gt;spool off</span>

Log file needs to be reviewed for any errors. catupgrd.sql took 42 minutes on my system. In case CLUSTER_DATABASE parameter is not set to False, you will get following error while starting database in upgrade mode

ORA-39701: database must be mounted EXCLUSIVE for UPGRADE or DOWNGRADE

We need to Restart the database now and run utlrp.sql.

<span style="font-size: small; font-family: arial,helvetica,sans-serif;">SQL&gt; SHUTDOWN IMMEDIATE
SQL&gt; STARTUP
SQL&gt; @?/rdbms/admin/utlrp.sql</span>

Confirm whether Database has been upgraded successfully by querying DBA_REGISTRY;

select comp_name,version,status from dba_registry;

Now set Cluster_database parameter to TRUE and start Database

<span style="font-size: small; font-family: arial,helvetica,sans-serif;">SQL&gt;alter system set cluster_database=true scope=spfile;
SQL&gt;Shutdown immediate;
[oracle@blrraclnx1 ~]$ srvctl start database -d orcl
[oracle@blrraclnx1 ~]$ srvctl start service -d orcl</span>

To upgrade DBConsole, run following command

<span style="font-size: small; font-family: arial,helvetica,sans-serif;">emca -upgrade db -cluster
</span>

This completes the upgrade process.

Verification of CRS Integrity Was Unsuccessful

While going through the routine checks from Grid Control, I found a critical alert stating “clusterware integrity check failed” and by clicking on this message it says that there is problem with some metric collections on RAC environment.

To check the node reachability status following query was run:

$ $CRS_HOME/bin/cluvfy comp nodecon -n all

This will check the internode connectivity for all nodes in the cluster. It came out with following message:

$ $CRS_HOME/bin/cluvfy comp nodecon -n all
Verifying node connectivity
Verification of node connectivity was unsuccessful on all the nodes.

Even the CRS component check was unsuccessful:

$ $CRS_HOME/bin/cluvfy comp crs -n all

It came out with the following message:

$ $CRS_HOME/bin/cluvfy comp crs -n all
Verifying CRS integrity
Verification of CRS integrity was unsuccessful on all the nodes.

After this it was quite obvious to check the CRS status:

$ crsctl check crs
CSS appears healthy
CRS appears healthy
EVM appears healthy
$crs_stat -t

Name           Type           Target    State     Host
------------------------------------------------------------
ora.orcl.db    application    ONLINE    ONLINE    rac1
ora....11.inst application    ONLINE    ONLINE    rac1
ora....12.inst application    ONLINE    ONLINE    rac2
ora....vice.cs application    ONLINE    ONLINE    rac2
ora....l1.srv application    ONLINE    ONLINE    rac1
ora....l1.srv application    ONLINE    ONLINE    rac2
ora....SM1.asm application    ONLINE    ONLINE    rac1
ora....DC.lsnr application    ONLINE    ONLINE    rac1
ora....idc.gsd application    ONLINE    ONLINE    rac1
ora....idc.ons application    ONLINE    ONLINE    rac1
ora....idc.vip application    ONLINE    ONLINE    rac1
ora....SM2.asm application    ONLINE    ONLINE    rac2
ora....C2.lsnr application    ONLINE    ONLINE    rac2
ora....dc2.gsd application    ONLINE    ONLINE    rac2
ora....dc2.ons application    ONLINE    ONLINE    rac2
ora....dc2.vip application    ONLINE    ONLINE    rac2
$$CRS_HOME/bin/olsnodes
rac1
rac2

This confirmed that the CRS install is valid, but the question now is why the cluster verification utility (CVU) was failing?

To find the reason I enabled the tracing of CVU as:

$export SRVM_TRACE=true

It will set the environment variable SRVM_TRACE to true and tracing of CVU will generate a trace file under $CRS_HOME/cv/log with name like “cvutrace.log.X”

After setting this and again running $CRS_HOME/bin/cluvfy comp crs -n all trace file with name cvutrace.log.0 was generated.

And a message in cvutrace.log like

<strong>"ksh: CVU_10.2.0.2_dba/exectask.sh: cannot execute"</strong>

Now its is clear that oracle is not able to execute exectask.sh and cheking the permission and ownership of exectask.sh:

$CRS_HOME/cv/remenv
ls -ltr
-rw-r--r--  1 oracle dba    184 Jan  9  2008 exectask.sh
-rw-r--r--  1 oracle dba 268386 Jan  9  2008 exectask

The permission of these two files was changed. After changing the permission back to 755 CUV was showing correct results.

$chmod 755 exectask*

It is still not discovered how the permission of these files got changed.

10g RAC – Single Node Install Error

Last week we were trying to setup a 2 Node 10g RAC System on Linux with openfiler used for shared storage. We were using the article written by Jeffery Hunter. This was not the first time I was doing it, but by mistake we chose the machine with 500Mb memory to be used for one of the RAC system and used a 1Gb memory machine for Openfiler. We carried on with the installation though cluvfy and runInstaller gave us warnings regarding the same.

But once the installation completed, I found the database was shutting down frequently with “PMON Failed to acquire Latch”. I tried to debug it, but was not able to figure out anything from Systemstate dump which was generated.

Anyways we decided to rebuild the system.So we decided to cleanup the Machine 2 and meanwhile re-installed the openfiler. I was not having the OEL Cd’s on that day,so couldn’t build the Machine 1. So I went ahead with cleaning Machine 2 for re-installing software. I saw an opportunity of setting up a single node RAC and then adding another node. I followed below steps for cleaning the RAC installation

1)Stop the Nodeapps and Clusterware

<span style="font-size: small; font-family: arial,helvetica,sans-serif;">srvctl stop nodeapss -n</span>

This will shutdown the Database,ASM instance and also the nodeapps. After this , you can stop the clusterware.

<span style="font-size: small; font-family: arial,helvetica,sans-serif;">#crsctl stop crs</span>

Please note that you can directly stop the clusterware (while cleaning up) as this will automatically stop the dependent resources.

2) Remove the installation files and other related files
I had installed CRS in /u01/app/crs and Database home was located in /u01/app/oracle. So I removed both the directories.

<span style="font-size: small; font-family: arial,helvetica,sans-serif;">rm -rf /u01/app/crs
rm -rf /u01/app/oracle</span>

Note that if you are having multiple oracle database installation, then ensure that you do not remove orainventory directoy or any other ORACLE_HOME. In my case this was the only installation. Remove following files related to clusterware

<span style="font-size: small; font-family: arial,helvetica,sans-serif;">        rm /etc/oracle/*
	rm -f /etc/init.d/init.cssd
	rm -f /etc/init.d/init.crs
	rm -f /etc/init.d/init.crsd
	rm -f /etc/init.d/init.evmd
	rm -f /etc/rc2.d/K96init.crs
	rm -f /etc/rc2.d/S96init.crs
	rm -f /etc/rc3.d/K96init.crs
	rm -f /etc/rc3.d/S96init.crs
	rm -f /etc/rc5.d/K96init.crs
	rm -f /etc/rc5.d/S96init.crs
        rm -Rf /etc/oracle/scls_scr
	rm -f /etc/inittab.crs
	cp /etc/inittab.orig /etc/inittab
        rm -f /var/tmp/.oracle</span>

Also remove the OCR and Voting disk files. In my case it was stored in OCFS2 filesystem /u02/oradata/orcl. In case it is on raw devices , you can remove it using dd command. Remove ocr.loc file present in /etc/oracle

You can also refer to Note:239998.1 – 10g RAC: How to Clean Up After a Failed CRS Install

In our case as we were re-installing after successful installation. So we even had to clean the ASM disks. They can be again cleaned up by formatting the header with dd command.

<span style="font-size: small; font-family: arial,helvetica,sans-serif;">dd if=/dev/zero of=/dev/sdb bs=1024 count=100</span>

As we were removing the other node and had to reconfigure SSH, I removed /home/oracle/.ssh directory. I didn’t reconfigure SSH again thinking that as it will not be required for single node install. I restarted the Clusterware install and encountered following error

“The Specified nodes are not clusterable”

In another window, one more error was reported, which actually made it clear where the problem was

“Failed to check remote command execution setup for node <nodename> shells /us/bin/ssh and /usr/bin/rsh”

Screenshot for the error can be seen below

Above error clearly states that error was due to unavailability of ssh or rsh. After this I did setup for ssh for single node and tested this too to avoid any further errors.

<span style="font-size: small; font-family: arial,helvetica,sans-serif;">$ ssh blrraclnx2 date
Sun Aug 10 14:32:29 EDT 2008</span>

Anyways all these errors could have been avoided, had I used cluvfy utility as below

<span style="font-size: small; font-family: arial,helvetica,sans-serif;">$./runcluvfy.sh stage -pre crsinst -n blrraclnx2 -verbose</span>

Learning:- Always use Cluvfy utility to ensure all pre-requisites are met before installing RAC components

Enabling Flashback On RAC Database

Enabling Flashback /Archive Log mode on a Single Instance Database is quite straight forward. In case of RAC, you need to follow additional steps.

The requirements for enabling Flashback Database are:

  • Your database must be running in ARCHIVELOG mode, because archived logs are used in the Flashback Database operation.
  • You must have a flash recovery area enabled, because flashback logs can only be stored in the flash recovery area.
  • For Real Application Clusters databases, the flash recovery area must be stored in a clustered file system or in ASM.

First of all configure flash recovery area by setting db_recovery_file_dest_size and db_recovery_file_dest

<span style="font-family: arial,helvetica,sans-serif; font-size: small;">ALTER SYSTEM SET DB_RECOVERY_FILE_DEST_SIZE = 20G SCOPE=BOTH;
ALTER SYSTEM SET DB_RECOVERY_FILE_DEST = '+DG1' SCOPE=BOTH;</span>

We are using ASM Diskgroup here which is sharable and available to both the nodes.Next step is to enable archivelog mode and then to turn on flashback. To perform this, database needs to be in mount mode.

We can use srvctl to disable any associated Database service and then stop the Database

<span style="font-family: arial,helvetica,sans-serif; font-size: small;">[/home/oracle&gt;srvctl stop service -d </span><span><span style="font-family: arial,helvetica,sans-serif; font-size: small;">TESTDB</span></span><span style="font-family: arial,helvetica,sans-serif; font-size: small;">
[/home/oracle&gt;srvctl stop database -d TESTDB</span>

Now set Cluster_database=false for enabling the Archivelog mode. This is a additional step which is required in RAC Database. For Single Instance, we do not require it.

<span style="font-family: arial,helvetica,sans-serif; font-size: small;">/home/oracle&gt;sqlplus "/ as sysdba"

Connected to an idle instance.

SQL&gt; startup nomount
ORACLE instance started.

Total System Global Area 1073741824 bytes
Fixed Size                  1271564 bytes
Variable Size             314575092 bytes
Database Buffers          750780416 bytes
Redo Buffers                7114752 bytes
SQL&gt; alter system set cluster_database=false scope=spfile;

System altered.

SQL&gt; shutdown immediate
ORA-01507: database not mounted

ORACLE instance shut down.
SQL&gt; exit
SQL&gt; startup mount
ORACLE instance started.

Total System Global Area 1073741824 bytes
Fixed Size                  1271564 bytes
Variable Size             314575092 bytes
Database Buffers          750780416 bytes
Redo Buffers                7114752 bytes
Database mounted.
SQL&gt;alter database archivelog;
SQL&gt; alter database flashback on;

Database altered.
</span>

Set the Cluster_database parameter again to true.

<span style="font-family: arial,helvetica,sans-serif; font-size: small;">SQL&gt;  alter system set cluster_database=true scope=spfile;
System altered.
SQL&gt;shutdown immediate</span>

We will again use srvctl to start the database and associated service

<span style="font-family: arial,helvetica,sans-serif; font-size: small;">[/home/oracle&gt;srvctl start database -d TESTDB
[/home/oracle&gt;srvctl start service -d </span><span><span style="font-family: arial,helvetica,sans-serif; font-size: small;">TESTDB</span></span><span style="font-family: arial,helvetica,sans-serif; font-size: small;">
</span>

We can confirm whether Archivelog mode and Flashback is enabled by querying V$DATABASE

<span style="font-family: arial,helvetica,sans-serif; font-size: small;">SQL&gt; SELECT LOG_MODE,FLASHBACK_ON FROM V$DATABASE;
LOG_MODE     FLASHBACK_ON
------------ ------------------
ARCHIVELOG   YES
</span>

10.2.0.4 Window’s Patchset overwrites Sqlnet.ora

Not sure, how many of you are aware of this alert. Oracle has published an Alert document NOTE:726418.1 -ALERT: The 10.2.0.4 Windows Patchset Overwrites %ORACLE_HOME%\network\admin\sqlnet.ora

According to it, Patch 6810189 – 10.2.0.4 RDBMS patchset on Microsoft Windows (32-bit) and Microsoft Windows (AMD64 and EM64T) overwrites the %ORACLE_HOME%\network\admin\sqlnet.ora file.

In case you have Downloaded (and installed) the 10.2.0.4 patchset for Windows before 10 July, 2008 , then please download the software again. In case you have not customized the sqlnet.ora file, then no action is needed.

Simplified Approach to Resolve ORA-4031

After writing few Case studies and other related articles, I will be sharing my approach for Resolving ORA -4031 error. First we will see what ORA-4031 actually means.

04031, 00000, "unable to allocate %s bytes of shared memory (\"%s\",\"%s\",\"%s\,\"%s\")"
// *Cause:  More shared memory is needed than was allocated in the shared pool.
// *Action: If the shared pool is out of memory, either use the
//          dbms_shared_pool package to pin large packages,
//          reduce your use of shared memory, or increase the amount of
//          available shared memory by increasing the value of the
//          INIT.ORA parameters "shared_pool_reserved_size" and
//          "shared_pool_size".
//          If the large pool is out of memory, increase the INIT.ORA
//          parameter "large_pool_size".

ORA-4031 error is encountered when we do not have sufficient memory available in shared pool/large pool to service a memory request. But in actual ORA – 4031 can be encountered in any of these areas

1) Shared pool
2) Large Pool
3) Java Pool
4)Streams pool (new to 10g)

This brings us to the first step in our pursuit for finding the cause for ORA -4031.

Step1: Identify the Pool associated with error

Like any other Oracle error, we first need to check Database Alert Log file and also any trace files which gets generated during that time in user_dump_dest,background_dump_dest. Though there are cases when ORA-4031 error is not recorded in alert.log. Starting from 9.2.0.5, you should be able to see trace files which gets generated in udump/bdump location (Depending on whether background process or user process encountered the error).

ORA – 4031 has basically three arguments

1) Size requested
2) Area
3) Comment

ORA-4031: unable to allocate <size requested> bytes of shared memory (“area “,”comment”)

e.g ORA-4031: unable to allocate 2196 bytes of shared memory
(shared pool,JOB$","KGLS heap","KGLS MEM BLOCK))

So we see from above that the error has occurred in Shared Pool. This is very important step as in case of other pools, ORA-4031 errors are resolved by increasing Java_pool_size and Streams_pool _size.

In this article I will be discussing mostly about errors encountered in Shared pool with small section on Large Pool.

Step2: What is value of SHARED_POOL_SIZE?

Current settings for shared pool related parameters can be found using below query

SQL>col name for a50
SQL>col value for a10
SQL> select nam.ksppinm NAME, val.KSPPSTVL VALUE from x$ksppi nam, x$ksppsv val
where nam.indx = val.indx and nam.ksppinm like '%shared_pool%' order by 1;

NAME                                               VALUE
-------------------------------------------------- ----------
__shared_pool_size                                 654311424
_dm_max_shared_pool_pct                            1
_enable_shared_pool_durations                      TRUE
_io_shared_pool_size                               4194304
_shared_pool_max_size                              0
_shared_pool_minsize_on                            FALSE
_shared_pool_reserved_min_alloc                    4400
_shared_pool_reserved_pct                          5
shared_pool_reserved_size                          19293798
shared_pool_size                                   0

You can use following notes for checking the minimum shared pool size

Note 105813.1 – SCRIPT TO SUGGEST MINIMUM SHARED POOL SIZE

In case of 10g, you can use SGA_TARGET parameter for managing values of Shared Pool,Large pool, Streams Pool,Java Pool, Buffer Cache (DB_CACHE_SIZE). Following note can be used for 10g

Note 270935.1 – Shared pool sizing in 10g

It is recommended to set a lower limit for SHARED_POOL_SIZE parameter.

You can also use V$LIBRARYCACHE view (AWR/Statspack report also has this section) and check if there were lot of Reloads happening for SQL AREA and TABLE/PROCEDURE Namespace. This gives indication that Shared Pool is not appropriately sized. In case you see high value for Invalidations, then this could be due to executing DDL against the objects, gathering stats (DBMS_STATS), or granting/revoking privileges.

High Value for Hard parses in AWR/Statspack report can also be caused by shared pool sizing issues but it cannot be used as a sole criteria as High hard parses can be caused by use of literals and presence of version counts/Child Cursors. This is discussed in section Using Literals Instead of Bind Variables and Multiple Child Cursors/High Version Count.

Some more key points related to Shared pool Sizing

-Shared pool memory consumption varies from release to release

-10g might fail with shared pool of 300 Mb though 8i was working fine

-Some part of memory allocated to fixed structures. Parameters like db_files, open_cursors and processes contribute to Overhead. When you use “Show SGA” command, you will see that “Variable Size” will be more then sum of “Shared Pool + Large Pool + Java Pool”. This is attributed to the value of these parameters.

Please note that in case you specify a low value for SGA_MAX_SIZE, you will see Oracle bumping the value to higher value so as to accomodate high value of Overhead memory.

Staring from 10g, Overhead memory is accomodated in shared_pool_size.

e.g If you specify SHARED_POOL_SIZE as 200 MB and your internal overhead is 100 Mb, then your actual shared pool value available to instance is only 100Mb.

You can read Note:351018.1 – Minimum for SHARED_POOL_SIZE Parameter in 10.2 Version for more information.

Shared Pool Fragmentation

Shared Pool fragmentation also can cause ORA-4031. This is caused when your queries are not being shared and you are seeing lot of reloads and Hard parses in the Statspack Report. In this case check the request failure size

ORA-4031: unable to allocate 16400 bytes of shared memory

We see that failure size is 16K. In this case you can see if you are using Shared_pool_reserved_size parameter for defining shared pool reserved area. Algorithm for memory allocation is such that it will first try to get memory from the shared pool and then if the requested memory size is greater then _Shared_pool_reserved_min_alloc , then it will get the memory from Shared Pool Reserved area. By default this value is set to 4400 bytes. In case the failure value is say 4200, you can try reducing the value of this parameter to reduce the occurences. Though this is not the complete solution. Read Tweaking _Shared_pool_reserved_min_alloc and ORA-4031 for more details.

You can also identify shared pool fragmentation by querying X$KSMSP

select  'sga heap('||KSMCHIDX||',0)'sga_heap,ksmchcom ChunkComment,
decode(round(ksmchsiz/1000),0,'0-1K', 1,'1-2K', 2,'2-3K',
3,'3-4K',4,'4-5K',5,'5-6k',6,'6-7k',7,'7-8k',8,'8-9k', 9,'9-10k','> 10K') "Size",
count(*), ksmchcls "Status", sum(ksmchsiz) "Bytes" from x$ksmsp
where KSMCHCOM = 'free memory' group by 'sga heap('||KSMCHIDX||',0)',
ksmchcom, ksmchcls, decode(round(ksmchsiz/1000),0,'0-1K', 1,'1-2K', 2,'2-3K',
 3,'3-4K',4,'4-5K',5,'5-6k',6,'6-7k',7,'7-8k',8,'8-9k', 9,'9-10k','> 10K') 

SGA_HEAP       CHUNKCOMMENT     Size    COUNT(*) Status          Bytes
-------------- ---------------- ----- ---------- ---------- ----------
sga heap(1,0)  free memory      > 10K        393 free         11296600
sga heap(1,0)  free memory      3-4K         256 free           781928
sga heap(1,0)  free memory      8-9k          63 free           510656
sga heap(1,0)  free memory      6-7k          60 free           367076
sga heap(1,0)  free memory      2-3K         555 free          1071448
sga heap(1,0)  free memory      1-2K        1818 free          1397244
sga heap(1,0)  free memory      0-1K        3418 free           348344
sga heap(1,0)  free memory      9-10k         30 free           269820
sga heap(1,0)  free memory      4-5K         154 free           640332
sga heap(1,0)  free memory      5-6k          75 free           381920
sga heap(1,0)  free memory      > 10K         39 R-free        8302632
sga heap(1,0)  free memory      7-8k          22 free           152328

If you see lot of memory chunks in 1-4k and very few in buckets >5K then it indicates Shared Pool Fragmentation. In this case you need to also look at Hard Parses (Statspack/AWR Report). This is discussed in section Using Literals Instead of Bind Variables and Multiple Child Cursors/High Version Count.

Note: – It is not recommended to run queries on X$KSMSP as it can lead to Latching issues. Do not run them frequently (I have seen people scheduling them as part of Oracle Hourly jobs. This should be avoided)

Step3: Is it MTS? If Yes, then are you using LARGE_POOL_SIZE?

LARGE_POOL_SIZE recommended for many features of Oracle which are designed to utilize large shared memory chunks like

– Recovery Manager (RMAN)

– parallel processing/IO slave processing. e.g px msg pool consuming more memory

– Shared Server Configuration

UGA will be allocated from shared pool in case large pool is not configured. So this can cause issues while using Shared Server Mode (MTS). Ensure that you are using LARGE_POOL_SIZE parameter or SGA_TARGET.

Step4: Are you having Multiple Subpools?

Subpool concept introduced from 9i R2. Instead of one big shared pool, memory will be divided into many sub pools.To determine number of subpools, you can use below query

SQL> select nam.ksppinm NAME, val.KSPPSTVL VALUE from x$ksppi nam, x$ksppsv val
where nam.indx = val.indx and nam.ksppinm like ‘%kghdsidx%’ order by 1 ;

NAME VALUE
—————————— ——————–
_kghdsidx_count 4

Above query indicates that there are 4 subpools

In case you get ORA-4031 and trace file gets generated, then the trace file can also be used to know the number of subpools configured. To do this search on “Memory Utilization of Subpool”
e.g
Memory Utilization of Subpool 1
========================
free memory 10485760
Memory Utilization of Subpool 2
========================

free memory 20971520

This means that there are two subpools configured for your database.

Oracle suggest having 500M as minimum subpool size. I will say that in case you are not facing serious Shared pool Latch contention, 2 subpools should be sufficient (though I believe most of contention issues can be solved by tuning the application). To change the number of subpools, we need to set parameter _kghdsidx_count in pfile or spfile and restart the database

In case of Spfile

alter system set “_kghdsidx_count”=1 scope=spfile;

Restart of database is required as it is a Static parameter. Please note that Large pool has same number of subpools as shared pool so you might be required to change number of subpools in case you are observing ORA-4031 in large pool.

You can read more about Shared Subpools in my earlier post

Step5: Is Sqlarea consuming lot of Memory?

Actually this can also be categorized into “Bad Application Design” as most of the cases are caused by way applications have been designed. High value for sqlarea in V$SGASTAT (or AWR/Statspack report) can be attributed to following causes

Using Literals Instead of Bind Variables

This is the most common cause for ORA-4031. Tom Kyte explains this on one of his post consequences of not using bind variables

If you do not use bind variables and you flood the server with
hundreds/thousands of unique queries you will
-run dog slow
-consume a ton of RAM (and maybe run out)
-not scale beyond a handful of users, if
thatamong other really bad side effects.
The above statement is true and you can find lot of cases where not using Bind variables caused excessive Parsing issues (leading to CPU contention) and ORA-4031 issues. One of the way to locate such statements is by running following query.
SELECT substr(sql_text,1,90) "SQL",count(*) "SQL Copies",
   sum(executions) "TotExecs", sum(sharable_mem) "TotMemory"
FROM v$sqlarea
WHERE executions &lt; 5
GROUP BY substr(sql_text,1,90) HAVING count(*) > 30
ORDER BY 2;

I personally try to use script from Asktom website to find these statements. You can find ,more information by clicking here

create table t1 as select sql_text from v$sqlarea;
alter table t1 add sql_text_wo_constants varchar2(1000);
create or replace function
remove_constants( p_query in varchar2 ) return varchar2
as
    l_query long;
    l_char  varchar2(1);
    l_in_quotes boolean default FALSE;
begin
    for i in 1 .. length( p_query )
    loop
        l_char := substr(p_query,i,1);
        if ( l_char = '''' and l_in_quotes )
        then
            l_in_quotes := FALSE;
        elsif ( l_char = '''' and NOT l_in_quotes )
        then
            l_in_quotes := TRUE;
            l_query := l_query  '''#';
        end if;
        if ( NOT l_in_quotes ) then
            l_query := l_query  l_char;
        end if;
    end loop;
l_query := translate( l_query, '0123456789', '@@@@@@@@@@' );
    for i in 0 .. 8 loop
        l_query := replace( l_query, lpad('@',10-i,'@'), '@' );
        l_query := replace( l_query, lpad(' ',10-i,' '), ' ' );
    end loop;
    return upper(l_query);
end;
/
update t1 set sql_text_wo_constants = remove_constants(sql_text);

select sql_text_wo_constants, count(*)
  from t1
 group by sql_text_wo_constants
having count(*) > 100
 order by 2
/

Above query will give you queries which are using literals and should be modified to use bind variables. Sometimes it is not possible to modify the application, in that case you can use CURSOR_SHARING=SIMILAR/FORCE to force the application to use bind variables. Please note that this can cause issues (especially CURSOR_SHARING=SIMILAR), so it is recommended to test the application in Test environment before implementing in Production. Applications like Oracle Apps do not certify use of this parameter so also check with your application vendor if this can be used.

You can refer to following articles where I have discussed similar issue

ORA-4031 – A Case Study

Application Design and ORA-4031

Multiple Child Cursors/High Version Count

This is also one of the cause for high usage of memory in SQLAREA region. Child cursors are generated in Shared pool when the SQL text is same but Oracle cannot share it because the underlying objects are different or different optimizer settings, etc. To know about child cursors, refer to following Metalink note

Note 296377.1 – Handling and resolving unshared cursors/large version_counts

In case of Oracle 10g, you can use Statspack/AWR report for finding the child cursors under category “SQL ordered by Version Counts”. Following statements can also be run to identify if child cursors are being generated in your database


For 10g

SQL> select sa.sql_text,sa.version_count ,ss.*from v$sqlarea sa,v$sql_shared_cursor ss
where sa.address=ss.address and sa.version_count > 50 order by sa.version_count ;

For 8i/9i

select sa.sql_text,sa.version_count ,ss.*from v$sqlarea sa,v$sql_shared_cursor ss
where sa.address=ss.KGLHDPAR and sa.version_count > 50 order by sa.version_count ;

Results returned by above query reports SQL which are not being shared due to some reason. You should find column with Value Y to find the cause. Most of these issues are encountered while using CURSOR_SHARING=SIMILAR. In case you are using this parameter with columns having Histograms, then it is expected behavior.Read more about Cursor issues related to Histograms in Note:261020.1 – High Version Count with CURSOR_SHARING = SIMILAR or FORCE

There are cases where none of the column value returns Y value. Most of these cases, you need to work with Oracle support to find the cause as this could be a bug.

Child Cursors are problematic as they increase shared pool memory consumption, High parsing and also as the number of child cursors increase, Oracle will take more time to span all the child cursors to match if it can reuse them, if not then it spawns a new child cursor. This results in High Parsing time and CPU contention.

High Sharable Memory per SQL

One more cause for high value of SQLAREA in V$SGASTAT is high memory consumption for SQL statement. This can be due to poorly written SQL statement or due to Oracle Bugs.

In case of Oracle 10g, you can use Statspack/AWR report for finding the statements with high value of Sharable Memory. You can also use Sharable_mem column in V$SQLAREA to find these queries.

Step6:What Next?

You have followed all the above steps and find everything is ok. Now what do we check next?

We can look for any trace file which got generated during the time of error and see which component was taking more memory. You can try searching in metalink with that component. Else you can take a heapdump at time of error and upload the file to support.

Heapdump event
The Heapdump event is used to dump memory from different subheaps. Errors ora-4030 are associated with problems in the pga, uga or cga heaps, and error ora-4031 is related only to problems with the shared pool/large pool/Java Pool/Streams Pool.

command - >  alter system set events ‘4031 trace name heapdump level 2’;
init.ora - >events=’4031 trace name heapdump, level 2’
SQL>oradebug setmypid
SQL>oradebug dump heapdump 2
SQL>oradebug tracefile_name

Staring from 9.2.0.5, level 536870914 can be used for generating heapdump which will gather more diagnostic information for support to diagnose the cause.

Also it is not recommended to set Heapdump event in init.ora or spfile since it will force multiple dumps at time of Shared Pool memory issues. Oracle requires Shared pool Latch for dumping heapdump, so this can worsen the Latching situation. You can set Errorstack event to generate trace file at time of ORA-4031 error

alter system set events '4031 trace name errorstack level 3';

Use immediate trace option or Oradebug command at time of error

SQL> connect / as sysdba
SQL> alter session set events 'immediate trace name heapdump level 536870914';

OR

sqlplus "/ as sysdba"
oradebug setmypid
oradebug unlimit
oradebug dump heapdump 536870914
oradebug tracefile_name
exit

Upload the tracefile to Oracle support.

Using the above approach will help you to resolve ORA-4031 in Shared Pool.

Large Pool

While working on ORA-4031 in large pool, you need to follow below approach

1)Check size for LARGE_POOL_SIZE. If possible increase it.

2)Check number of subpools. Ensure that you have sufficient memory in each subpool. _kghdsidx_count is used to control the number of subpools in large pool also. So you would have to either increase memory available in each subpool or decrease the count.

3)In case of MTS, check if any session is consuming lot of memory. It’s a case where instead of getting ORA-4030, you get ORA-4031 in large pool (In MTS, UGA is part of large pool).

4)If all above suggestions have been tried, then capture heapdump and upload the file to Oracle Support. You can use level 32 or 536870944 i.e

SQL> connect / as sysdba
SQL> alter session set events 'immediate trace name heapdump level 32';
or
SQL> alter session set events 'immediate trace name heapdump level 536870944';

I hope this article helps in following a methodology for resolving ORA-4031. At present this article is not exhaustive article on this error and it will be more useful if it can be used as a approach after you have gone through below metalink notes.

Note:62143.1 – Understanding and Tuning the Shared Pool

Note:396940.1 – Troubleshooting and Diagnosing ORA-4031 Error

Note:146599.1 – Diagnosing and Resolving Error ORA-04031