oracle

10g RAC – Single Node Install Error

Last week we were trying to setup a 2 Node 10g RAC System on Linux with openfiler used for shared storage. We were using the article written by Jeffery Hunter. This was not the first time I was doing it, but by mistake we chose the machine with 500Mb memory to be used for one of the RAC system and used a 1Gb memory machine for Openfiler. We carried on with the installation though cluvfy and runInstaller gave us warnings regarding the same.

But once the installation completed, I found the database was shutting down frequently with “PMON Failed to acquire Latch”. I tried to debug it, but was not able to figure out anything from Systemstate dump which was generated.

Anyways we decided to rebuild the system.So we decided to cleanup the Machine 2 and meanwhile re-installed the openfiler. I was not having the OEL Cd’s on that day,so couldn’t build the Machine 1. So I went ahead with cleaning Machine 2 for re-installing software. I saw an opportunity of setting up a single node RAC and then adding another node. I followed below steps for cleaning the RAC installation

1)Stop the Nodeapps and Clusterware

<span style="font-size: small; font-family: arial,helvetica,sans-serif;">srvctl stop nodeapss -n</span>

This will shutdown the Database,ASM instance and also the nodeapps. After this , you can stop the clusterware.

<span style="font-size: small; font-family: arial,helvetica,sans-serif;">#crsctl stop crs</span>

Please note that you can directly stop the clusterware (while cleaning up) as this will automatically stop the dependent resources.

2) Remove the installation files and other related files
I had installed CRS in /u01/app/crs and Database home was located in /u01/app/oracle. So I removed both the directories.

<span style="font-size: small; font-family: arial,helvetica,sans-serif;">rm -rf /u01/app/crs
rm -rf /u01/app/oracle</span>

Note that if you are having multiple oracle database installation, then ensure that you do not remove orainventory directoy or any other ORACLE_HOME. In my case this was the only installation. Remove following files related to clusterware

<span style="font-size: small; font-family: arial,helvetica,sans-serif;">        rm /etc/oracle/*
	rm -f /etc/init.d/init.cssd
	rm -f /etc/init.d/init.crs
	rm -f /etc/init.d/init.crsd
	rm -f /etc/init.d/init.evmd
	rm -f /etc/rc2.d/K96init.crs
	rm -f /etc/rc2.d/S96init.crs
	rm -f /etc/rc3.d/K96init.crs
	rm -f /etc/rc3.d/S96init.crs
	rm -f /etc/rc5.d/K96init.crs
	rm -f /etc/rc5.d/S96init.crs
        rm -Rf /etc/oracle/scls_scr
	rm -f /etc/inittab.crs
	cp /etc/inittab.orig /etc/inittab
        rm -f /var/tmp/.oracle</span>

Also remove the OCR and Voting disk files. In my case it was stored in OCFS2 filesystem /u02/oradata/orcl. In case it is on raw devices , you can remove it using dd command. Remove ocr.loc file present in /etc/oracle

You can also refer to Note:239998.1 – 10g RAC: How to Clean Up After a Failed CRS Install

In our case as we were re-installing after successful installation. So we even had to clean the ASM disks. They can be again cleaned up by formatting the header with dd command.

<span style="font-size: small; font-family: arial,helvetica,sans-serif;">dd if=/dev/zero of=/dev/sdb bs=1024 count=100</span>

As we were removing the other node and had to reconfigure SSH, I removed /home/oracle/.ssh directory. I didn’t reconfigure SSH again thinking that as it will not be required for single node install. I restarted the Clusterware install and encountered following error

“The Specified nodes are not clusterable”

In another window, one more error was reported, which actually made it clear where the problem was

“Failed to check remote command execution setup for node <nodename> shells /us/bin/ssh and /usr/bin/rsh”

Screenshot for the error can be seen below

Above error clearly states that error was due to unavailability of ssh or rsh. After this I did setup for ssh for single node and tested this too to avoid any further errors.

<span style="font-size: small; font-family: arial,helvetica,sans-serif;">$ ssh blrraclnx2 date
Sun Aug 10 14:32:29 EDT 2008</span>

Anyways all these errors could have been avoided, had I used cluvfy utility as below

<span style="font-size: small; font-family: arial,helvetica,sans-serif;">$./runcluvfy.sh stage -pre crsinst -n blrraclnx2 -verbose</span>

Learning:- Always use Cluvfy utility to ensure all pre-requisites are met before installing RAC components

Raw Devices to be deprecated in 12G

I was browsing through the OraNA.info posts and found an interesting post from Steve Karam which referred to a metalink Note:578455.1 – Announcement of De-Support of RAW devices in Release 12G

If you go through the note, it mentions desupport of Raw devices from Oracle Database 12G. Article also lists out possibility of using ASM,OCFS as few of the alternative storage mediums for keeping OCR and Voting Disks (Used in Oracle Real Application Clusters(RAC)) .

OCFS2 already supports the storage of OCR and Voting Disk. But note also talks about ASM will be supporting the files. Hmmm… If this has to be true, lot of changes will be required in the architecture. Currently ASM instance starts after CSS (and other clusterware services in RAC) service has been started . But this change will mean that ASM has to start before these processes. Currently if you try to start the ASM instance with CSS service down, you get following error

[oracle@blrraclnx2 ~]$ export ORACLE_SID=+ASM1
[oracle@blrraclnx2 ~]$ sqlplus 

SQL*Plus: Release 10.2.0.1.0 - Production on Sun Aug 10 08:16:09 2008

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

Enter user-name: / as sysdba
Connected to an idle instance.

SQL> startup
ORA-29701: unable to connect to Cluster Manager
SQL> exit
Disconnected
[oracle@blrraclnx2 ~]$ oerr ora 29701
29701, 00000, "unable to connect to Cluster Manager"
// *Cause: Connect to CM failed or timed out.
// *Action: Verify that the CM was started.  If the CM was not started,
//          start it and then retry the database startup. If the CM died
//          or is not responding, check the Oracle and CM trace files for
//          errors.

Apart from changing the architecture, it will also involve lot of effort from Oracle DBA’s to unlearn and learn new concepts 🙂 At the same time, it will help you start the ASM Instance even though CSS is not up!! (I know many people will be having a sigh of relief after reading the last line) Or is there something else in store for us! There are also lot of RAC and ASM features expected in 11gR2. So let’s wait and watch..

Enabling Flashback On RAC Database

Enabling Flashback /Archive Log mode on a Single Instance Database is quite straight forward. In case of RAC, you need to follow additional steps.

The requirements for enabling Flashback Database are:

  • Your database must be running in ARCHIVELOG mode, because archived logs are used in the Flashback Database operation.
  • You must have a flash recovery area enabled, because flashback logs can only be stored in the flash recovery area.
  • For Real Application Clusters databases, the flash recovery area must be stored in a clustered file system or in ASM.

First of all configure flash recovery area by setting db_recovery_file_dest_size and db_recovery_file_dest

<span style="font-family: arial,helvetica,sans-serif; font-size: small;">ALTER SYSTEM SET DB_RECOVERY_FILE_DEST_SIZE = 20G SCOPE=BOTH;
ALTER SYSTEM SET DB_RECOVERY_FILE_DEST = '+DG1' SCOPE=BOTH;</span>

We are using ASM Diskgroup here which is sharable and available to both the nodes.Next step is to enable archivelog mode and then to turn on flashback. To perform this, database needs to be in mount mode.

We can use srvctl to disable any associated Database service and then stop the Database

<span style="font-family: arial,helvetica,sans-serif; font-size: small;">[/home/oracle&gt;srvctl stop service -d </span><span><span style="font-family: arial,helvetica,sans-serif; font-size: small;">TESTDB</span></span><span style="font-family: arial,helvetica,sans-serif; font-size: small;">
[/home/oracle&gt;srvctl stop database -d TESTDB</span>

Now set Cluster_database=false for enabling the Archivelog mode. This is a additional step which is required in RAC Database. For Single Instance, we do not require it.

<span style="font-family: arial,helvetica,sans-serif; font-size: small;">/home/oracle&gt;sqlplus "/ as sysdba"

Connected to an idle instance.

SQL&gt; startup nomount
ORACLE instance started.

Total System Global Area 1073741824 bytes
Fixed Size                  1271564 bytes
Variable Size             314575092 bytes
Database Buffers          750780416 bytes
Redo Buffers                7114752 bytes
SQL&gt; alter system set cluster_database=false scope=spfile;

System altered.

SQL&gt; shutdown immediate
ORA-01507: database not mounted

ORACLE instance shut down.
SQL&gt; exit
SQL&gt; startup mount
ORACLE instance started.

Total System Global Area 1073741824 bytes
Fixed Size                  1271564 bytes
Variable Size             314575092 bytes
Database Buffers          750780416 bytes
Redo Buffers                7114752 bytes
Database mounted.
SQL&gt;alter database archivelog;
SQL&gt; alter database flashback on;

Database altered.
</span>

Set the Cluster_database parameter again to true.

<span style="font-family: arial,helvetica,sans-serif; font-size: small;">SQL&gt;  alter system set cluster_database=true scope=spfile;
System altered.
SQL&gt;shutdown immediate</span>

We will again use srvctl to start the database and associated service

<span style="font-family: arial,helvetica,sans-serif; font-size: small;">[/home/oracle&gt;srvctl start database -d TESTDB
[/home/oracle&gt;srvctl start service -d </span><span><span style="font-family: arial,helvetica,sans-serif; font-size: small;">TESTDB</span></span><span style="font-family: arial,helvetica,sans-serif; font-size: small;">
</span>

We can confirm whether Archivelog mode and Flashback is enabled by querying V$DATABASE

<span style="font-family: arial,helvetica,sans-serif; font-size: small;">SQL&gt; SELECT LOG_MODE,FLASHBACK_ON FROM V$DATABASE;
LOG_MODE     FLASHBACK_ON
------------ ------------------
ARCHIVELOG   YES
</span>

11g New Feature: Database Replay

Database Replay:

In critical production systems, before making changes to the environment( Hardware or software related), lots of testing is required in test systems to know the effect of those changes and this usually takes lots of time to test for the stability and effects of the change.

In Oracle 11g this thing is simplified and a new feature “database replay” is added which can be used to simulate the production workload on a test system.

Following are the three stages of DATABASE REPLAY:
1. Workload Capture
2. Workload Preprocessing
3. Workload Replay

1. Workload Capture:

Before starting the capture process you need to take care of following things:

Login as sys user

It is always recommened (not mandatory) to shutdown the database and then start the database in restricted mode. By doing so, you will be making sure that no transactions are running on the database before capture process is actually started. It will guarantee the complete workload capture and you will be running full transactions during database replay on test system.
If you do not follow this step then you have to be prepared for some divergance of results during database replay on test system.

SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down
SQL> startup restrict
ORACLE instance started.

Total System Global Area  313860096 bytes
Fixed Size                  1299624 bytes
Variable Size             184552280 bytes
Database Buffers          121634816 bytes
Redo Buffers                6373376 bytes
Database mounted.
Database opened.
SQL > select LOGINS from v$instance;
LOGINS
----------
RESTRICTED

Important thing to note here is that, after starting the capture process the database will automatically be switched into UNRESTRICTED mode and the users will be able to connect normally.

After capturing the initial database information, the capture process will internally fire a statement to disable restricted sessions, this can be verified by setting 10046 trace before starting the capture process as:

SQL> alter session set events '10046 trace name context forever, level 4';

Now start the capture process.

SQL> alter session set events '10046 trace name context off';

You will see following information:

PARSING IN CURSOR #21 len=39 dep=1 uid=0 oct=49 lid=0 tim=121695902847 hv=14656808 ad='0' sqlid='7shphjbg23h8'
alter system disable restricted session
END OF STMT
PARSE #21:c=0,e=6,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=0,tim=121695384843

Now check the information from database:

SQL > select LOGINS from v$instance;
LOGINS
----------
ALLOWED

– Create a directory to hold the workload information.

SQL > create directory DIR as '/tmp/ss';

Check the permission of directory ss at OS level and make sure that oracle is able to write to this directory.
Also make sure that the directory is empty, before starting the capture process.

– Define Workload Filters
By adding filters we can restrict the Workload Capture process to be specific to certain user sessions. By default all the sessions will be captured.

There are two types of filters i.e INCLUDE and EXCLUDE.

If INCLUDE is used then the filter will be active and workload for the objects mentioned in the filter will not be captured rather all other user’s workload Will be captured i.e the objects defined in the filter will be FILTERED OUT from capture process.

If EXCLUDE is used, only the workload done by objects defined in the filter will be captured, no other client request to database will be captured.

EXAMPLE:
========

SQL> exec dbms_workload_capture.ADD_FILTER( fname  IN VARCHAR2, fattribute  IN VARCHAR2,fvalue IN VARCHAR2);

fname=Name of the filter.
fattribute=Attribute on which the filter will be applied i.e USER, PROGRAM, INSTANCE_NUMBER etc.
fvalue= value for the corresponding attribute.

SQL > exec dbms_workload_capture.ADD_FILTER( fname =>'FILTER_SCOTT',fattribute => 'USER',fvalue => 'SCOTT');

– Starting workload capture:

To start the workload capture START_CAPTURE procedure is used as:

SQL > exec dbms_workload_capture.start_capture(NAME => 'PEAK_LOAD_TEST',DIR => 'DIR', DURATION => 600);

Name,Dir are mandatory parameters while duration (in seconds) is optional, here the capture will run for 600 seconds, if the duration is not specified then you have to finish the capture manually as:

SQL > dbms_workload_capture.finish_capture();

– While the capture process is runnig it will generate two files : wcr_scapture.wmd and wcr_cap_000xx.start

– After using the finish_capture procedure you will get folloing files also: wcr_cr.html and wcr_cr.text

wcr_cr.html is similar to the AWR report generated by awrrpt.sql

During capture process database initiate a AWR report for the duration of capture period.

SQL>  select id,AWR_BEGIN_SNAP,AWR_END_SNAP from dba_workload_captures;

<span style="font-family: arial,helvetica,sans-serif; font-size: small;">ID AWR_BEGIN_SNAP AWR_END_SNAP
---------- -------------- ------------
1             20           21
2             29           30

– Exporting AWR data for the workload capture:

It will enable detailed analysis of the workload:

SQL> exec DBMS_WORKLOAD_CAPTURE.EXPORT_AWR (capture_id => 2);

After this you can see two more files generated under the specified directory:

wcr_ca.dmp and wcr_ca.log

2. Workload Preprocessing:

Once the workload is captured it can be replayed on the test system. For this you need to copy the capture files to the test system. Preprocessing a workload will convert the captured files into REPLAY files and create the necessary metadata for this. Preprocessing will put load on the system and is resource intensive and should be performed only on test systems and on the same version of oracle database.

– Create a new directory for the capture files placed on the test system.
Let us suppose that the capture files are placed under /tmp/ss on test system, now create a directory as:

SQL > create directory REPLAY as '/tmp/ss';

SQL>EXEC DBMS_WORKLOAD_REPLAY.PROCESS_CAPTURE(capture_dir => 'REPLAY');

After preprocessing followings files will be generated:

wcr_process.wmd, wcr_login.pp, wcr_seq_data.extb, wcr_scn_order.extb , wcr_conn_data.extb

3. Workload Replay:

You have to use an instance which is logically identical to the instance whose workload is captured and need to setup Replay Client (WRC).

– wrc executable has to be run in calibrate mode to estimate the number of replay clients.

$  wrc mode=calibrate replaydir=/tmp/sood

Workload Replay Client: Release 11.1.0.6.0 - Production on Sat Jul 26 18:49:29 2008

Copyright (c) 1982, 2007, Oracle.  All rights reserved

Report for Workload in: /tmp/sood
-----------------------

Recommendation:
Consider using at least 1 clients divided among 1 CPU(s).

Workload Characteristics:
- max concurrency: 1 sessions
- total number of sessions: 5

Assumptions:
- 1 client process per 50 concurrent sessions
- 4 client process per CPU
- think time scale = 100
- connect time scale = 100
- synchronization = TRUE

– Initialize Replay Data:


Initializing replay loads the metadata into tables required for database replay.

SQL>; exec DBMS_WORKLOAD_REPLAY.INITIALIZE_REPLAY (replay_name => 'REPLAY_1', REPLAY_DIR=> 'REPLAY');

– Put the database in “PREPARE REPLAY” mode as:

SQL > exec DBMS_WORKLOAD_REPLAY.PREPARE_REPLAY()

To check the status of replay use:

SQL > select name,status from  dba_workload_replays;
NAME                 STATUS
-------------------- ----------------------------------------
REPLAY_1             PREPARE

— Start the replay client:

Workload Replay Client: Release 11.1.0.6.0 - Production on Sat Jul 26 22:12:35 2008

Copyright (c) 1982, 2007, Oracle.  All rights reserved.

Wait for the replay to start (22:12:35)

– Start the replay as:

SQL> exec DBMS_WORKLOAD_REPLAY.START_REPLAY ();

Check the status of the replay as:

SQL> select name,status from  dba_workload_replays;

Initially it will show :
NAME                 STATUS
-------------------- ----------------------------------------
REPLAY_1             IN PROGRESS

And after that it will be shown as completed.

– Once the status is completed finish the replay as:

SQL> exec DBMS_WORKLOAD_REPLAY.CANCEL_REPLAY ();

To generate the report of the replay in HTML format you need to run :

DECLARE
My_Report  CLOB;
BEGIN
My_Report := DBMS_WORKLOAD_REPLAY.REPORT(replay_id => 1, format => DBMS_WORKLOAD_REPLAY.TYPE_HTML);
END;

10.2.0.4 Window’s Patchset overwrites Sqlnet.ora

Not sure, how many of you are aware of this alert. Oracle has published an Alert document NOTE:726418.1 -ALERT: The 10.2.0.4 Windows Patchset Overwrites %ORACLE_HOME%\network\admin\sqlnet.ora

According to it, Patch 6810189 – 10.2.0.4 RDBMS patchset on Microsoft Windows (32-bit) and Microsoft Windows (AMD64 and EM64T) overwrites the %ORACLE_HOME%\network\admin\sqlnet.ora file.

In case you have Downloaded (and installed) the 10.2.0.4 patchset for Windows before 10 July, 2008 , then please download the software again. In case you have not customized the sqlnet.ora file, then no action is needed.

Simplified Approach to Resolve ORA-4031

After writing few Case studies and other related articles, I will be sharing my approach for Resolving ORA -4031 error. First we will see what ORA-4031 actually means.

04031, 00000, "unable to allocate %s bytes of shared memory (\"%s\",\"%s\",\"%s\,\"%s\")"
// *Cause:  More shared memory is needed than was allocated in the shared pool.
// *Action: If the shared pool is out of memory, either use the
//          dbms_shared_pool package to pin large packages,
//          reduce your use of shared memory, or increase the amount of
//          available shared memory by increasing the value of the
//          INIT.ORA parameters "shared_pool_reserved_size" and
//          "shared_pool_size".
//          If the large pool is out of memory, increase the INIT.ORA
//          parameter "large_pool_size".

ORA-4031 error is encountered when we do not have sufficient memory available in shared pool/large pool to service a memory request. But in actual ORA – 4031 can be encountered in any of these areas

1) Shared pool
2) Large Pool
3) Java Pool
4)Streams pool (new to 10g)

This brings us to the first step in our pursuit for finding the cause for ORA -4031.

Step1: Identify the Pool associated with error

Like any other Oracle error, we first need to check Database Alert Log file and also any trace files which gets generated during that time in user_dump_dest,background_dump_dest. Though there are cases when ORA-4031 error is not recorded in alert.log. Starting from 9.2.0.5, you should be able to see trace files which gets generated in udump/bdump location (Depending on whether background process or user process encountered the error).

ORA – 4031 has basically three arguments

1) Size requested
2) Area
3) Comment

ORA-4031: unable to allocate <size requested> bytes of shared memory (“area “,”comment”)

e.g ORA-4031: unable to allocate 2196 bytes of shared memory
(shared pool,JOB$","KGLS heap","KGLS MEM BLOCK))

So we see from above that the error has occurred in Shared Pool. This is very important step as in case of other pools, ORA-4031 errors are resolved by increasing Java_pool_size and Streams_pool _size.

In this article I will be discussing mostly about errors encountered in Shared pool with small section on Large Pool.

Step2: What is value of SHARED_POOL_SIZE?

Current settings for shared pool related parameters can be found using below query

SQL>col name for a50
SQL>col value for a10
SQL> select nam.ksppinm NAME, val.KSPPSTVL VALUE from x$ksppi nam, x$ksppsv val
where nam.indx = val.indx and nam.ksppinm like '%shared_pool%' order by 1;

NAME                                               VALUE
-------------------------------------------------- ----------
__shared_pool_size                                 654311424
_dm_max_shared_pool_pct                            1
_enable_shared_pool_durations                      TRUE
_io_shared_pool_size                               4194304
_shared_pool_max_size                              0
_shared_pool_minsize_on                            FALSE
_shared_pool_reserved_min_alloc                    4400
_shared_pool_reserved_pct                          5
shared_pool_reserved_size                          19293798
shared_pool_size                                   0

You can use following notes for checking the minimum shared pool size

Note 105813.1 – SCRIPT TO SUGGEST MINIMUM SHARED POOL SIZE

In case of 10g, you can use SGA_TARGET parameter for managing values of Shared Pool,Large pool, Streams Pool,Java Pool, Buffer Cache (DB_CACHE_SIZE). Following note can be used for 10g

Note 270935.1 – Shared pool sizing in 10g

It is recommended to set a lower limit for SHARED_POOL_SIZE parameter.

You can also use V$LIBRARYCACHE view (AWR/Statspack report also has this section) and check if there were lot of Reloads happening for SQL AREA and TABLE/PROCEDURE Namespace. This gives indication that Shared Pool is not appropriately sized. In case you see high value for Invalidations, then this could be due to executing DDL against the objects, gathering stats (DBMS_STATS), or granting/revoking privileges.

High Value for Hard parses in AWR/Statspack report can also be caused by shared pool sizing issues but it cannot be used as a sole criteria as High hard parses can be caused by use of literals and presence of version counts/Child Cursors. This is discussed in section Using Literals Instead of Bind Variables and Multiple Child Cursors/High Version Count.

Some more key points related to Shared pool Sizing

-Shared pool memory consumption varies from release to release

-10g might fail with shared pool of 300 Mb though 8i was working fine

-Some part of memory allocated to fixed structures. Parameters like db_files, open_cursors and processes contribute to Overhead. When you use “Show SGA” command, you will see that “Variable Size” will be more then sum of “Shared Pool + Large Pool + Java Pool”. This is attributed to the value of these parameters.

Please note that in case you specify a low value for SGA_MAX_SIZE, you will see Oracle bumping the value to higher value so as to accomodate high value of Overhead memory.

Staring from 10g, Overhead memory is accomodated in shared_pool_size.

e.g If you specify SHARED_POOL_SIZE as 200 MB and your internal overhead is 100 Mb, then your actual shared pool value available to instance is only 100Mb.

You can read Note:351018.1 – Minimum for SHARED_POOL_SIZE Parameter in 10.2 Version for more information.

Shared Pool Fragmentation

Shared Pool fragmentation also can cause ORA-4031. This is caused when your queries are not being shared and you are seeing lot of reloads and Hard parses in the Statspack Report. In this case check the request failure size

ORA-4031: unable to allocate 16400 bytes of shared memory

We see that failure size is 16K. In this case you can see if you are using Shared_pool_reserved_size parameter for defining shared pool reserved area. Algorithm for memory allocation is such that it will first try to get memory from the shared pool and then if the requested memory size is greater then _Shared_pool_reserved_min_alloc , then it will get the memory from Shared Pool Reserved area. By default this value is set to 4400 bytes. In case the failure value is say 4200, you can try reducing the value of this parameter to reduce the occurences. Though this is not the complete solution. Read Tweaking _Shared_pool_reserved_min_alloc and ORA-4031 for more details.

You can also identify shared pool fragmentation by querying X$KSMSP

select  'sga heap('||KSMCHIDX||',0)'sga_heap,ksmchcom ChunkComment,
decode(round(ksmchsiz/1000),0,'0-1K', 1,'1-2K', 2,'2-3K',
3,'3-4K',4,'4-5K',5,'5-6k',6,'6-7k',7,'7-8k',8,'8-9k', 9,'9-10k','> 10K') "Size",
count(*), ksmchcls "Status", sum(ksmchsiz) "Bytes" from x$ksmsp
where KSMCHCOM = 'free memory' group by 'sga heap('||KSMCHIDX||',0)',
ksmchcom, ksmchcls, decode(round(ksmchsiz/1000),0,'0-1K', 1,'1-2K', 2,'2-3K',
 3,'3-4K',4,'4-5K',5,'5-6k',6,'6-7k',7,'7-8k',8,'8-9k', 9,'9-10k','> 10K') 

SGA_HEAP       CHUNKCOMMENT     Size    COUNT(*) Status          Bytes
-------------- ---------------- ----- ---------- ---------- ----------
sga heap(1,0)  free memory      > 10K        393 free         11296600
sga heap(1,0)  free memory      3-4K         256 free           781928
sga heap(1,0)  free memory      8-9k          63 free           510656
sga heap(1,0)  free memory      6-7k          60 free           367076
sga heap(1,0)  free memory      2-3K         555 free          1071448
sga heap(1,0)  free memory      1-2K        1818 free          1397244
sga heap(1,0)  free memory      0-1K        3418 free           348344
sga heap(1,0)  free memory      9-10k         30 free           269820
sga heap(1,0)  free memory      4-5K         154 free           640332
sga heap(1,0)  free memory      5-6k          75 free           381920
sga heap(1,0)  free memory      > 10K         39 R-free        8302632
sga heap(1,0)  free memory      7-8k          22 free           152328

If you see lot of memory chunks in 1-4k and very few in buckets >5K then it indicates Shared Pool Fragmentation. In this case you need to also look at Hard Parses (Statspack/AWR Report). This is discussed in section Using Literals Instead of Bind Variables and Multiple Child Cursors/High Version Count.

Note: – It is not recommended to run queries on X$KSMSP as it can lead to Latching issues. Do not run them frequently (I have seen people scheduling them as part of Oracle Hourly jobs. This should be avoided)

Step3: Is it MTS? If Yes, then are you using LARGE_POOL_SIZE?

LARGE_POOL_SIZE recommended for many features of Oracle which are designed to utilize large shared memory chunks like

– Recovery Manager (RMAN)

– parallel processing/IO slave processing. e.g px msg pool consuming more memory

– Shared Server Configuration

UGA will be allocated from shared pool in case large pool is not configured. So this can cause issues while using Shared Server Mode (MTS). Ensure that you are using LARGE_POOL_SIZE parameter or SGA_TARGET.

Step4: Are you having Multiple Subpools?

Subpool concept introduced from 9i R2. Instead of one big shared pool, memory will be divided into many sub pools.To determine number of subpools, you can use below query

SQL> select nam.ksppinm NAME, val.KSPPSTVL VALUE from x$ksppi nam, x$ksppsv val
where nam.indx = val.indx and nam.ksppinm like ‘%kghdsidx%’ order by 1 ;

NAME VALUE
—————————— ——————–
_kghdsidx_count 4

Above query indicates that there are 4 subpools

In case you get ORA-4031 and trace file gets generated, then the trace file can also be used to know the number of subpools configured. To do this search on “Memory Utilization of Subpool”
e.g
Memory Utilization of Subpool 1
========================
free memory 10485760
Memory Utilization of Subpool 2
========================

free memory 20971520

This means that there are two subpools configured for your database.

Oracle suggest having 500M as minimum subpool size. I will say that in case you are not facing serious Shared pool Latch contention, 2 subpools should be sufficient (though I believe most of contention issues can be solved by tuning the application). To change the number of subpools, we need to set parameter _kghdsidx_count in pfile or spfile and restart the database

In case of Spfile

alter system set “_kghdsidx_count”=1 scope=spfile;

Restart of database is required as it is a Static parameter. Please note that Large pool has same number of subpools as shared pool so you might be required to change number of subpools in case you are observing ORA-4031 in large pool.

You can read more about Shared Subpools in my earlier post

Step5: Is Sqlarea consuming lot of Memory?

Actually this can also be categorized into “Bad Application Design” as most of the cases are caused by way applications have been designed. High value for sqlarea in V$SGASTAT (or AWR/Statspack report) can be attributed to following causes

Using Literals Instead of Bind Variables

This is the most common cause for ORA-4031. Tom Kyte explains this on one of his post consequences of not using bind variables

If you do not use bind variables and you flood the server with
hundreds/thousands of unique queries you will
-run dog slow
-consume a ton of RAM (and maybe run out)
-not scale beyond a handful of users, if
thatamong other really bad side effects.
The above statement is true and you can find lot of cases where not using Bind variables caused excessive Parsing issues (leading to CPU contention) and ORA-4031 issues. One of the way to locate such statements is by running following query.
SELECT substr(sql_text,1,90) "SQL",count(*) "SQL Copies",
   sum(executions) "TotExecs", sum(sharable_mem) "TotMemory"
FROM v$sqlarea
WHERE executions &lt; 5
GROUP BY substr(sql_text,1,90) HAVING count(*) > 30
ORDER BY 2;

I personally try to use script from Asktom website to find these statements. You can find ,more information by clicking here

create table t1 as select sql_text from v$sqlarea;
alter table t1 add sql_text_wo_constants varchar2(1000);
create or replace function
remove_constants( p_query in varchar2 ) return varchar2
as
    l_query long;
    l_char  varchar2(1);
    l_in_quotes boolean default FALSE;
begin
    for i in 1 .. length( p_query )
    loop
        l_char := substr(p_query,i,1);
        if ( l_char = '''' and l_in_quotes )
        then
            l_in_quotes := FALSE;
        elsif ( l_char = '''' and NOT l_in_quotes )
        then
            l_in_quotes := TRUE;
            l_query := l_query  '''#';
        end if;
        if ( NOT l_in_quotes ) then
            l_query := l_query  l_char;
        end if;
    end loop;
l_query := translate( l_query, '0123456789', '@@@@@@@@@@' );
    for i in 0 .. 8 loop
        l_query := replace( l_query, lpad('@',10-i,'@'), '@' );
        l_query := replace( l_query, lpad(' ',10-i,' '), ' ' );
    end loop;
    return upper(l_query);
end;
/
update t1 set sql_text_wo_constants = remove_constants(sql_text);

select sql_text_wo_constants, count(*)
  from t1
 group by sql_text_wo_constants
having count(*) > 100
 order by 2
/

Above query will give you queries which are using literals and should be modified to use bind variables. Sometimes it is not possible to modify the application, in that case you can use CURSOR_SHARING=SIMILAR/FORCE to force the application to use bind variables. Please note that this can cause issues (especially CURSOR_SHARING=SIMILAR), so it is recommended to test the application in Test environment before implementing in Production. Applications like Oracle Apps do not certify use of this parameter so also check with your application vendor if this can be used.

You can refer to following articles where I have discussed similar issue

ORA-4031 – A Case Study

Application Design and ORA-4031

Multiple Child Cursors/High Version Count

This is also one of the cause for high usage of memory in SQLAREA region. Child cursors are generated in Shared pool when the SQL text is same but Oracle cannot share it because the underlying objects are different or different optimizer settings, etc. To know about child cursors, refer to following Metalink note

Note 296377.1 – Handling and resolving unshared cursors/large version_counts

In case of Oracle 10g, you can use Statspack/AWR report for finding the child cursors under category “SQL ordered by Version Counts”. Following statements can also be run to identify if child cursors are being generated in your database


For 10g

SQL> select sa.sql_text,sa.version_count ,ss.*from v$sqlarea sa,v$sql_shared_cursor ss
where sa.address=ss.address and sa.version_count > 50 order by sa.version_count ;

For 8i/9i

select sa.sql_text,sa.version_count ,ss.*from v$sqlarea sa,v$sql_shared_cursor ss
where sa.address=ss.KGLHDPAR and sa.version_count > 50 order by sa.version_count ;

Results returned by above query reports SQL which are not being shared due to some reason. You should find column with Value Y to find the cause. Most of these issues are encountered while using CURSOR_SHARING=SIMILAR. In case you are using this parameter with columns having Histograms, then it is expected behavior.Read more about Cursor issues related to Histograms in Note:261020.1 – High Version Count with CURSOR_SHARING = SIMILAR or FORCE

There are cases where none of the column value returns Y value. Most of these cases, you need to work with Oracle support to find the cause as this could be a bug.

Child Cursors are problematic as they increase shared pool memory consumption, High parsing and also as the number of child cursors increase, Oracle will take more time to span all the child cursors to match if it can reuse them, if not then it spawns a new child cursor. This results in High Parsing time and CPU contention.

High Sharable Memory per SQL

One more cause for high value of SQLAREA in V$SGASTAT is high memory consumption for SQL statement. This can be due to poorly written SQL statement or due to Oracle Bugs.

In case of Oracle 10g, you can use Statspack/AWR report for finding the statements with high value of Sharable Memory. You can also use Sharable_mem column in V$SQLAREA to find these queries.

Step6:What Next?

You have followed all the above steps and find everything is ok. Now what do we check next?

We can look for any trace file which got generated during the time of error and see which component was taking more memory. You can try searching in metalink with that component. Else you can take a heapdump at time of error and upload the file to support.

Heapdump event
The Heapdump event is used to dump memory from different subheaps. Errors ora-4030 are associated with problems in the pga, uga or cga heaps, and error ora-4031 is related only to problems with the shared pool/large pool/Java Pool/Streams Pool.

command - >  alter system set events ‘4031 trace name heapdump level 2’;
init.ora - >events=’4031 trace name heapdump, level 2’
SQL>oradebug setmypid
SQL>oradebug dump heapdump 2
SQL>oradebug tracefile_name

Staring from 9.2.0.5, level 536870914 can be used for generating heapdump which will gather more diagnostic information for support to diagnose the cause.

Also it is not recommended to set Heapdump event in init.ora or spfile since it will force multiple dumps at time of Shared Pool memory issues. Oracle requires Shared pool Latch for dumping heapdump, so this can worsen the Latching situation. You can set Errorstack event to generate trace file at time of ORA-4031 error

alter system set events '4031 trace name errorstack level 3';

Use immediate trace option or Oradebug command at time of error

SQL> connect / as sysdba
SQL> alter session set events 'immediate trace name heapdump level 536870914';

OR

sqlplus "/ as sysdba"
oradebug setmypid
oradebug unlimit
oradebug dump heapdump 536870914
oradebug tracefile_name
exit

Upload the tracefile to Oracle support.

Using the above approach will help you to resolve ORA-4031 in Shared Pool.

Large Pool

While working on ORA-4031 in large pool, you need to follow below approach

1)Check size for LARGE_POOL_SIZE. If possible increase it.

2)Check number of subpools. Ensure that you have sufficient memory in each subpool. _kghdsidx_count is used to control the number of subpools in large pool also. So you would have to either increase memory available in each subpool or decrease the count.

3)In case of MTS, check if any session is consuming lot of memory. It’s a case where instead of getting ORA-4030, you get ORA-4031 in large pool (In MTS, UGA is part of large pool).

4)If all above suggestions have been tried, then capture heapdump and upload the file to Oracle Support. You can use level 32 or 536870944 i.e

SQL> connect / as sysdba
SQL> alter session set events 'immediate trace name heapdump level 32';
or
SQL> alter session set events 'immediate trace name heapdump level 536870944';

I hope this article helps in following a methodology for resolving ORA-4031. At present this article is not exhaustive article on this error and it will be more useful if it can be used as a approach after you have gone through below metalink notes.

Note:62143.1 – Understanding and Tuning the Shared Pool

Note:396940.1 – Troubleshooting and Diagnosing ORA-4031 Error

Note:146599.1 – Diagnosing and Resolving Error ORA-04031