database

ORA-01426 After Upgrade to 10g

Oracle seems to have a tendency to make small changes to database code which impacts application in a big way. Sad part is, these changes are not always documented properly. One of these cases is what I encountered. A simple PL/SQL code errors out as below:

Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options

SQL> declare
2 v_code NUMBER(20);
3 BEGIN
4 v_code := 990 * 10000000;
5 END;
6 /
declare
*
ERROR at line 1:
ORA-01426: numeric overflow
ORA-06512: at line 4

Same block works fine in any previous versions like 9i and 8i. As per Oracle this is “Expected Behavior”. Oracle used to perform number arithmetic. In 10g integer arithmetic is being used. Any application using such code would require to undergo change as follows:

SQL> ed
Wrote file afiedt.buf

1 declare
2 v_v1 number :=990;
3 v_v2 number :=10000000;
4 v_code NUMBER(20);
5 BEGIN
6 v_code := v_v1 * v_v2 ;
7* END;
SQL> /

PL/SQL procedure successfully completed.

SQL> ed
Wrote file afiedt.buf

1 declare
2 v_v1 number :=990;
3 v_code NUMBER(20);
4 BEGIN
5 v_code := v_v1 * 10000000 ;
6* END;
SQL> /

PL/SQL procedure successfully completed.

SQL> declare
2 v_code NUMBER(20);
3 BEGIN
4 v_code := 990 * 10000000.0;
5 END;
6 /

PL/SQL procedure successfully completed.

Using any of the above methods, error is not observed. It is although not clear why oracle made this change. Another hurdle in moving to 10g!!. Hope Oracle provides Database capture feature on 9i and Replay on 10g.

ORA- 4031 – A Case Study

Today I will be taking up one ORA-4031 issue we faced on one of our client database. Please note that SQL statements mentioned below have been changed and does not reveal any confidential information.

Our client was facing ORA-4031 on a 9.2 database.This was a new Database which had gone Live recently. Following error messages were recorded in alert log.

ORA-00604: error occurred at recursive SQL level 1
ORA-04031: unable to allocate 4200 bytes of shared memory ("shared pool","select    owner#,name,namespace...","sga heap(1,0)","library cache")

Initial shared pool at time of failure was 160 Mb. After the failure the shared_pool_size was increased to 1 Gb. We did not have any other information and were asked to perform RCA for the same. Luckily we had Statspack configured on the server, so we took a report for the duration when the error occurred.

              Snap Id     Snap Time      Sessions Curs/Sess Comment
            --------- ------------------ -------- --------- -------------------
Begin Snap:     1000 15-Apr-08 06:32:06  #######       4.6
  End Snap:     1001 15-Apr-0806:53:41  #######       4.6
   Elapsed:               21.58 (mins)

Cache Sizes (end)
~~~~~~~~~~~~~~~~~
Buffer Cache:     2,048M      Std Block Size:          8K
Shared Pool Size:       160M          Log Buffer:     32,768K

Load Profile
~~~~~~~~~~~~                Per Second       Per Transaction
                           ---------------       ---------------
    Redo size:              2,459.98              2,911.95
    Logical reads:              6,447.76              7,632.40
    Block changes:                  8.31                  9.83
    Physical reads:              3,642.28              4,311.47
    Physical writes:                  0.80                  0.95
    User calls:                106.51                126.08
    <strong>Parses:                151.48                179.31</strong>
    <strong>Hard parses:                 11.08                 13.12</strong>
    Sorts:                 91.50                108.31
    Logons:                  1.41                  1.67
    Executes:                283.09                335.10
    Transactions:                  0.84

We can clearly see that lot of hard parses were occurring at the time of error.

Library Cache Activity for DB: TESTDB1  Instance: TESTDB1
 Snaps: 1000 -1001-&gt;"Pct Misses"  should be very low

                         Get  Pct        Pin        Pct               Invali-
Namespace           Requests  Miss     Requests     Miss     Reloads  dations
--------------- ------------ ------ -------------- ------ ---------- --------
BODY                   1,564    1.9          2,903   62.9        469        0
CLUSTER                6,827    0.0          9,011    0.1          0        0
INDEX                  9,376   62.8          9,375   63.7          5        0
SQL AREA              77,058   11.0        442,341    5.2      <strong>6,723 </strong>       0
TABLE/PROCEDURE       72,495   10.4        135,173   24.2      <strong>8,291</strong>        0
TRIGGER                2,288   14.9          2,288   49.5        218        0
          -------------------------------------------------------------

Also looking at information from V$LIBRARYCACHE, we can see that there were lot of reloads happening for SQL AREA and TABLE/PROCEDURE Namespace. This gives indication that Shared Pool is not appropriately sized. So we kind of expected this to be a shared pool sizing issue.

But after increasing the shared_pool_size to 1 Gb, we saw that memory for “sql area” was increasing. So as to avoid further ORA-4031, we started flushing the shared pool when the free memory reached to 200M.

NAME                                  BYTES
    -------------------------- ----------------
 <strong>free memory                     597,154,376</strong>  &lt; -large amount of free, was done after flushing shared pool
<strong>sql area                        315,365,096</strong> &lt; - High value
library cache                    82,229,616
miscellaneous                    81,476,336
gcs resources                    39,331,928
gcs shadows                      26,133,184

Whenever I observe high memory usage for Sqlarea, I run following sql statements.

1) 10g

<strong><span style="color: #3366ff;">SQL&gt; select sa.sql_text,sa.version_count ,ss.*from v$sqlarea sa,v$sql_shared_cursor ss where<span> </span>sa.address=ss.address and sa.version_count &gt; 50 order by sa.version_count ;</span></strong>

8i/9i

<strong><span style="color: #3366ff;">select sa.sql_text,sa.version_count ,ss.*from v$sqlarea sa,v$sql_shared_cursor ss<span> </span>where sa.address=ss.KGLHDPAR<span> </span>and sa.version_count &gt; 50<span> </span>order by sa.version_count ;</span></strong>

– This one is to find if child cursors are being generated. To know about child cursors, refer to following Metalink note

Note 296377.1 – Handling and resolving unshared cursors/large version_counts

We did not get any SQL in this category.

2) Use script from Asktom website to find if application is using Bind variables. You can find ,more information by clicking here

This returned quite a few statements

    INSERT INTO EMP (columnname, columnname, columnname, columnname, columnname, columnname, columnname, columnname, columnname, columnname, columnname ,columnname) VALUES ('#', TO_DATE('#','#'), @, '#',
    @, @, @, '#', @, @, @,     @, @, @, @, @)          <strong>1948</strong>

    DELETE FROM EMP WHERE EMPID=@          <strong>1976</strong>

    INSERT INTO EMP (columnname, columnname, columnname, columnname) VALUES (@, @, '#',    @, NULL , @)          <strong>1976</strong>

    DELETE FROM EMP WHERE TYPE=@ AND EMPID=@          <strong>1976</strong>

     INSERT INTO DEPT VALUES (@, '#', '#', TO_DATE('#','#'), SYSDATE)
          <strong>2012</strong>

    DELETE FROM DEPT WHERE DEPTNO=@ AND DEPT_ID=@ AND MGR=@ AND SAL='#'          <strong>2375</strong>

    INSERT INTO SALARY (columnname, columnname, columnname, columnname, columnname, columnname, columnname) VA LUES (@, @, @, '#', '#', @, TO_DATE('#','#'), TO_DATE('#','#'), @, @)          <strong>2377</strong>

We can clearly see that statements not being shared and thus causing the high memory usage in shared pool.

Each sql statement in shared pool will be occupying some memory. If the sql are similar and differ only in literal, then they should be shared as otherwise they will occupy lot of extra space. By not sharing the statements, we are wasting space in shared pool and causing shared pool fragmentation.

Solution again is to use Bind variables or CURSOR_SHARING.

Resolving Shutdown Immediate Hang Situations

Many times Oracle DBA’s are faced with a situation where shutting down the instance with shutdown immediate seems to take infinite time and gives a impression that the session is hung. In most of the cases this is not a hang. Hang can be defined as a scenario when few sessions/processes are waiting on some action and in turn blocking other session/processes to complete their tasks. Now the original sessions could be waiting because of resource issues or Oracle Bugs.

Shutdown immediate can take long time to complete (appear to be hung) because of three reasons:

1. Uncommitted transactions are being rolled back.

2. SMON is cleaning temp segments or performing delayed block cleanouts.

3. Processes still continue to be connected to the database and do not terminate.

1. Uncommitted transactions are being rolled back:

This is the case when the message ‘Waiting for smon to disable tx recovery’ is posted in the alert log after we issue shutdown immediate.

There are two reasons for this:
– A large query was running at the time of shutdown immediate.
-A large transaction was running at the time of shutdown immediate.

For large queries:

SQL > select count(*) from v$session_longops where time_remaining>0;

If it returns a value > 0 then we can do a shutdown abort and then startup restrict and then again shutdown immediate.

For large transactions:

SQL > select sum(used_ublk) from v$transaction;

If it returns a large value then we have to wait for a long time for shutdowm to get completed.
If the large transaction is aborted and then shutdown is issued then we have to query v$fast_start_transactions and v$fast_start_server, we will not see anything in v$transaction at this time.

At this particular moment transaction recovery is going on and the count(*) will keep on decreasing:

SQL > select count(*) from v$fast_start_transaction;

Decreasing count will show that recovery is going on and when the recovery is completed the database will be shutdown.

But it is not desirable under some circumstances such as, when we have very short maintance window and we need to perform a shutdown immediate to do some work, in those cases we can use the following event and set in the init.ora file TEMPERORARLY To disable transaction recovery:

event=”10513 trace name context forever, level 2″

and bounce the instance and issue shutdown immediate to get complete without transaction recovery.SMON will not do a transaction recovery untill this event is set in the init.ora file so it is necessary to remove this event whenever you get a chance to shutdown the database again, this time shutdown immediate can even take 3-5 hours(Just remove this event from pfile).

2. SMON is cleaning temp segments or performing delayed block cleanouts:

During a SHUTDOWN IMMEDIATE and SHUTDOWN NORMAL, SMON cleans up extents which are no longer needed and marking them as freed. It means that count from uet$ will decrease and count in fet$ will increase.

To verify that the temporary segments are decreasing have an active session available in SQL during the SHUTDOWN IMMEDIATE. Run query to ensure the database is not hanging, but is actually perform extent cleanup:

SQL> select count(block#) from fet$;
COUNT(BLOCK)
----------
115

SQL> select count(block#) from uet$;
COUNT(BLOCK)
----------
713

After some time, issue the query again and check the results:

SQL> select count(block#) from fet$;
COUNT(BLOCK)
----------
210

SQL > select count(block#) from uet$;
COUNT(BLOCK)
----------
512

If you do not have sufficient time to wait for this cleanup then you can set the following event and bounce the database and reissue shutdown immediate to skip this cleanup:

event=”10061 trace name context forever, level 10″

It allows you to prevent SMON from cleaning up temporary segments. Again it is not recommended to set this event event forever. Whenever you have large downtime remove this event and allow SMON to do its work.

3. Processes still continue to be connected to the database and do not terminate:

After issuing shutdown immediate, If we see entries in alert log file as:

Tue Jan  8 12:00:27 2008
Active call for process 10071 user 'oracle' program '[email protected] (J001)'
SHUTDOWN: waiting for active calls to complete.
Tue Jan  8 12:00:57 2008

SHUTDOWN: Active sessions prevent database close operation

It shows that there are some active calls at program ‘[email protected] (J001)’ which pmon is not able to clear up.This message is due to the fact that database is waiting for pmon to clean up processes, but pmon is unable to clean them. The client connections to the server are causing the shutdown immediate or normal to hang. Do the following in this case:

1. Before shutdown immediate, shutdown the listener:

$ lsnrctl stop

2. Now check if there are any connection present at the database as:

$ ps -eaf | grep LOCAL

It will give you the OSPIDs of the client connected to database.

3 Manually kill them as:

# Kill -9 <OSPID>

4. Issue shutdown immediate now.

Do not forget to bring up the listener after startup 🙂

In addition to this you can set 10046 event in the session used to shutdown the instance. This will help to tell the event on which session is waiting

SQL>alter session set events '10046 trace name context forever, level 12'

SQL>Shutdown immediate;

Look for the trace file in user_dump_dest location. Also look at the alert.log for any other messages. They might be helpful in case the shutdown is experiencing hang situation.

Solving ORA-1031 while connecting as “/ as sysdba” :

Many times we see an issue like this:

SQL> conn / as sysdba
ERROR:
ORA-01031: insufficient privileges

This is a very common and frequent error that can occur after the new oracle software install
or due to some permissions changes at OS level.

I will dicuss the approach to solve ORA-1031 error on UNIX environment.

1. Check that oracle_sid and oracle_home are set correctly as:

$ echo $ORACLE_SID
$ echo $ORACLE_HOME

Find the values returned by above command and match these values under /etc/oratab file, these
have to be listed there.

EXAMPLE:
========

$ echo $ORACLE_SID
BSNL

$ echo $ORACLE_HOME
/u01/app/oracle/product/10.2.0/db_2

$ cat /etc/oratab

BSNL:/u01/app/oracle/product/10.2.0/db_2:N
VSNL:/u01/app/oracle/product/10.2.0/db_2:N

The values above are matching with /etc/oratab entries

If the oracle_sid and oracle_home are not set properly then set it as:

$ export ORACLE_SID=BSNL
$ export ORACLE_HOME=/u01/app/oracle/product/10.2.0/db_2

And try to connect as “/ as sysdba” It should work now.

If these are correct but still the error is coming then move to step 2.

2. Ensure TWO_TASK is not set

$ echo $TWO_TASK

If it return any lines as:

TWO_TASK=
OR
TWO_TASK=<some_db_name>

Then unset the environment variable as:

$ unset TWO_TASK

Now try to connect as “/ as sysdba”

If these are correct but still the error is coming then move to step 3.

3.Check the permissions on the oracle executable file:

$ cd $ORACLE_HOME/bin
$ ls -la oracle

It should show the following permissions:

-rwsr-s–x 1 oracle oinstall 96725724 Apr 2 13:43 oracle

If its not the same then issue the following command to set the correct permissions:

$ chmod 6751 oracle

If these are correct but still the error is coming then move to step 4.

4. Check for the dba group at OS level. We need to make sure that Operating System users issuing / as sysdba belongs to dba group at OS level.

There is one file we need to check for this i.e $ORACLE_HOME/rdbms/lib/config.s OR $ORACLE_HOME/rdbms/lib/config.c (File name vary from OS to OS on some OS it is config.c and on some OS it is config.s). The value in these file is typically set to “dba” as:

.ascii “dba”

Login as oracle user:
# su – oracle
$ id
uid=111(oracle) gid=123(usdba)

Look for the gid value here.(usdba)
The gid value is usdba so we need to modify the config.c or config.s so that it should look like:
.ascii “usdba”

After making changes to config file relink oracle binaries as:

– Make sure that no oracle processes running
– Login as oracle
– Make sure LD_LIBRARY_PATH and ORACLE_HOME are set properly
$ORACLE_HOME/bin/relink all

If these are correct but still the error is coming then move to step 5.

5. Make sure that dba group at OS level only exists once in /etc/group file and that the users belonging to the dba group are properly comma separated.

Example:

usdba::123:oracle,oracle1

ii) Check that the oracle user uid and gid are same in /etc/group and /etc/passwd

If all these 5 settings are correct and still ora-1031 is coming then the only option is to take truss output and check while opening which file the error is coming.
e.g.

$ truss -aefo /tmp/truss.out sqlplus “/ as sysdba”

6. Ensure you are invoking sqlplus from correct ORACLE_HOME

This actually came as comment on this post and I would agree. Ensure that you are using sqlplus from correct ORACLE_HOME. To do this set

export PATH=$PATH:$ORACLE_HOME/bin

You can confirm the home using which sqlplus command

AWE -What is It and How to use it?

32-bit Operating systems provide 4Gb of addressable space to applications. This includes memory for the user application and also for the system.

When running Oracle on 32-Bit Windows, we can have total memory limited to 1.7 Gb only. If you allocate memory more then 1.7 GB or if it increases during course of database operation, then you are likely to encounter errors such as ORA – 4030, ORA – 12500 or ORA – 27102.

There are few solutions to overcome this problem. I am going to discuss them in this post. Feel free to post any Questions/doubts you have.

1)Using /3GB switch

To increase the virtual addressable space for processes and applications, you need to use /3GB switch. The /3GB switch allows for providing memory upto 3GB for processes (In actual it is nearly 2.7 GB for Oracle) by reducing memory available to OS to 1 GB.
To implement it, you need to modify the boot.ini as follows and restart the server

[boot loader]
timeout=30
default=multi(0)disk(0)rdisk(0)partition(2)\WINNT
[operating systems]
multi(0)disk(0)rdisk(0)partition(2)\WINNT="????" /3GB

Note: -“????” in the above example can be the programmatic name of any of the following operating system versions:

– Windows XP Professional

– Windows Server 2003

– Windows Server 2003, Enterprise Edition

– Windows Server 2003, Datacenter Edition

– Windows 2000 Advanced Server

– Windows 2000 Datacenter Server

– Windows NT Server 4.0, Enterprise Edition

2)Using /PAE switch

Physical Address Extention (PAE) allows applications to have virtual addressable space beyond 4GB. PAE maps up to 64 GB of physical memory into a 32-bit (4 GB) virtual address space using either 4-KB or 2-MB pages.

This can be enabled using /PAE switch in boot.ini as follows

[boot loader]
timeout=30
default=multi(0)disk(0)rdisk(0)partition(2)\WINNT
[operating systems]
multi(0)disk(0)rdisk(0)partition(2)\WINNT="????" <strong>/PAE</strong>
<span style="font-family: Arial;">
</span>

Note: -“????” in the above example can be the programmatic name of any of the following operating system versions:

– Windows 2000 Datacenter Server

– Windows 2000 Advanced Server

– Windows 2003 Data Center Edition (32-Bit)

– Windows 2003 Enterprise Edition (32-Bit)

PAE is also known by name Address Windowing Extensions (AWE). This basically allows Oracle to have larger buffer cache. In addition to addition to boot.ini, you need to make some changes at database level also

1) Add parameter USE_INDIRECT_DATA_BUFFERS=TRUE to spfile/pfile

2) Remove DB_CACHE_SIZE parameter from pfile/spfile and replace it with DB_BLOCK_BUFFERS. To calculate it you need to use following DB_CACHE_SIZE/DB_BLOCK_SIZE

How PAE/AWE works?

PAE allows the system to expand the number of bits that can be used to address physical memory from 32 bits to 36 bits. Out of 1.7GB allocated to Oracle, 1 GB is allocated to a “WINDOW” which is used to map the buffer cache and allows for bigger SGA. This acts like a “Swap Area” where all the blocks are mapped into this window and then provided to the application.It is advisable to use /3GB switch along with /PAE switch so that addressable space is around 2.7GB.

To do it, change boot.ini to as follows

<span style="font-size: 12pt; font-family: Arial;"> </span><span style="font-size: 12pt; font-family: Arial;">multi(0)disk(0)rdisk(0)partition(2)\WINNT="????" <strong>/3GB /PAE</strong></span>

Note: – Using both /3GB and /PAE will limit memory addressability to 16 GB. In case you wish to address memory beyond 16GB, you will be required to unset /3GB switch.

AWE Window is specified by parameter AWE_WINDOW_SIZE. The size of this window is defined by a registry setting in the HOME key for Oracle (HKLM\Software\Oracle\Homex) called AWE_WINDOW_SIZE. Default value for AWE_WINDOW_SIZE is 1 GB and no entry in registry will mean that it is running at default value. Value for this parameter has to be specified in Bytes in registry.

Metalink Note:225349.1 and Oracle Docs mentions that you need to use AWE_WINDOW_MEMORY whereas Microsoft site mentions using AWE_WINDOW_SIZE in registry.

This 3GB space will be used by following components:

– The Value for AWE_WINDOW_SIZE- SGA (excluding Buffer Cache)- Overhead for Oracle.exe and DLL’s -Stack space for all threads (Defaults to 1MB/thread, unless orastack is used)- PGA and UGA memory for all user sessions

To understand it, lets take an example. Suppose we need to set memory for a new system using /3GB and /PAE

Out of 2.7GB of addressable space, 1 GB will be used for AW_WINDOW_SIZE. We are left with 1.7 GB which has to be used for SGA shared_pool_size + large_pool_size + java_pool_size + log_buffers +steams_pool_size)

**Please Note that you can not use ASMM i.e SGA_TARGET when using AWE.

AWE_WINDOW_SIZE can be decreased from 1GB to a lower value. This has to be calculated and set judiciously as a lower value can increase mapping/unmapping activity and can also lead to CPU Spinning. Please refer below note for details on calculating minimum AWE_WINDOW_SIZE parameter

Note:225349.1 – Implementing Address Windowing Extensions (AWE) or VLM on Windows Platforms

Other ways of reducing Memory usage

1) Use MTS (Shared Server Configuration) as this will reduce the number of processes and ultimately reducing memory usage.

2) Reduce value for *area_size parameter’s (if using Manual PGA management)

3) You can also use orastack to reduce the memory usage per thread (which defaults to 1 MB)

cmd>orastack oracle.exe 700000

cmd>orastack tnslsnr.exe 700000

cmd>orastack svrmgrl.exe 700000

cmd>orastack sqlplus.exe 700000

This sets the value to 700KB beyond which it is risky to decrease the value.

Issues with AWE

1) To use AWE  we need to use DB_BLOCK_BUFFERS parameter. As we cannot have DB_BLOCK_BUFFERS and DB_CACHE_SIZE parameter being used simultaneouly , we need to unset DB_CACHE_SIZE parameter. This will  mean that you cannot use Automatic Shared Memory Management (ASMM) with AWE i.e You have to unset SGA_TARGET parameter. Similarly you cannot use MEMORY_TARGET (11g ) parameter with AWE. Failure to do so will lead to

ORA-00825: cannot set DB_BLOCK_BUFFERS if SGA_TARGET or MEMORY_TARGET is set

Cause: SGA_TARGET or MEMORY_TARGET set with DB_BLOCK_BUFFERS set.

Action: Do not set SGA_TARGET, MEMORY_TARGET or use new cache parameters, and do not use DB_BLOCK_BUFFERS which is an old cache parameter.

2) Setting Low value for AWE_WINDOW_SIZE can lead to CPU spin issues.

3) Applications doing large amount of I/O will see Performance impact due to Paging.

4) Due to use of AWE, additional Overhead is introduced.

5) Memory beyond 4GB can only be used for Buffer Cache. So In case you are facing Shared pool memory issues or want large number of connections, then AWE cannot help.

Further Readings

Note 225349.1 – Implementing Address Windowing Extensions (AWE) or VLM on Windows Platforms
Note 46053.1 – Windows NT Memory Architecture Overview

http://www.microsoft.com/whdc/system/platform/server/PAE/PAEmem.mspx

http://www.microsoft.com/whdc/system/platform/server/PAE/pae_os.mspx

http://download.oracle.com/docs/cd/B19306_01/win.102/b14304/architec.htm#i1005826

Recovering from ORA-1578 ORA-8103 (Logical Corruption)

1. For ORA-1578:

There are two ways in which we can extract the data from a corrupted table:

a) Using DBMS_REPAIR.SKIP_CORRUPT_BLOCKS
b) Using Event 10231

a) Connect as sysdba user:

execute DBMS_REPAIR.SKIP_CORRUPT_BLOCKS(‘<schema_name>’,'<table_name>’);

Put the schema name and the table name of the corrupted table.

Then issue the following command to craete a new table from corrupted table:

create table new_table as select * from corrupted_table_name;

Now Drop the corrupted table, rename the newly created table to its original name and
clear the attribute for the new table as:

execute DBMS_REPAIR.SKIP_CORRUPT_BLOCKS

(‘<schema_name>’,'<table_name>’,flags=>dbms_repair.noskip_flag);

b) Set the event 10231 at session level to skip corrupted rows:

ALTER SESSION SET EVENTS ‘10231 TRACE NAME CONTEXT FOREVER, LEVEL 10’;

Then issue the following command to craete a new table from corrupted table:

create table new_table as select * from corrupted_table_name;

Now Drop the corrupted table, rename the newly created table to its original name.

2. For ORA-8103

This is a case of LOGICAL CORRUPTION and cannot be taken care by skipping the corrupted rows.

Follow Metalink Note 422547.1 to salvage data in case of ORA-8103