ORA-01426 After Upgrade to 10g

Oracle seems to have a tendency to make small changes to database code which impacts application in a big way. Sad part is, these changes are not always documented properly. One of these cases is what I encountered. A simple PL/SQL code errors out as below:

Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options

SQL> declare
2 v_code NUMBER(20);
3 BEGIN
4 v_code := 990 * 10000000;
5 END;
6 /
declare
*
ERROR at line 1:
ORA-01426: numeric overflow
ORA-06512: at line 4

Same block works fine in any previous versions like 9i and 8i. As per Oracle this is “Expected Behavior”. Oracle used to perform number arithmetic. In 10g integer arithmetic is being used. Any application using such code would require to undergo change as follows:

SQL> ed
Wrote file afiedt.buf

1 declare
2 v_v1 number :=990;
3 v_v2 number :=10000000;
4 v_code NUMBER(20);
5 BEGIN
6 v_code := v_v1 * v_v2 ;
7* END;
SQL> /

PL/SQL procedure successfully completed.

SQL> ed
Wrote file afiedt.buf

1 declare
2 v_v1 number :=990;
3 v_code NUMBER(20);
4 BEGIN
5 v_code := v_v1 * 10000000 ;
6* END;
SQL> /

PL/SQL procedure successfully completed.

SQL> declare
2 v_code NUMBER(20);
3 BEGIN
4 v_code := 990 * 10000000.0;
5 END;
6 /

PL/SQL procedure successfully completed.

Using any of the above methods, error is not observed. It is although not clear why oracle made this change. Another hurdle in moving to 10g!!. Hope Oracle provides Database capture feature on 9i and Replay on 10g.

11G:SYSASM role

Starting from Oracle 11g ASM, new privilege has been created for Administering Oracle ASM instances. This is called SYSASM role.

You can continue using SYSDBA role to connect to ASM but it will generate following warning messages at time of startup/shutdown, create Diskgroup/add disk, etc.

<strong>$ adrci</strong>

ADRCI: Release 11.1.0.6.0 - Beta on Tue Jul 1 15:43:57 2008
Copyright (c) 1982, 2007, Oracle.  All rights reserved.

ADR base = "/u03/app/oracle"
adrci&gt; help
adrci&gt; set homes
DIA-48431: Must specify at least one ADR home path

<strong>adrci&gt; show homes</strong>
ADR Homes:
diag/rdbms/testdb11/TESTDB11
diag/asm/+asm/+ASM
diag/clients/user_oracle/host_2900411789_11
diag/tnslsnr/testzone/listener
<strong>adrci&gt; set home diag/asm/+asm/+ASM
adrci&gt; show alert</strong>

ADR Home = /u03/app/oracle/diag/asm/+asm/+ASM:
******************************************
Starting ORACLE instance (normal)
<strong>WARNING: Deprecated privilege SYSDBA for command 'STARTUP'</strong>
2008-06-25 13:13:09.287000 +05:30

Please note that SYSASM privilege cannot be used to start the RDBMS instance. This way it ensures separation of database administration privilege from the ASM storage administration privilege. Trying to do so results in ORA – 1031

$ sqlplus
SQL*Plus: Release 11.1.0.6.0 - Production on Tue Jul 1 16:09:54 2008
Copyright (c) 1982, 2007, Oracle.  All rights reserved.
Enter user-name: / as sysasm
Connected to an idle instance.

SQL&gt; startup pfile=initest.ora
<strong>ORA-01031: insufficient privileges</strong>

As per documentation, the privilege to administer an ASM instance with SYSDBA will be removed in future versions. Anyways till now DBA’s can administer the ASM instance 😉

New ASM Background Processes in 11G

Few hours back I installed Oracle Database 11g(Though still DB is yet to be created), so I started playing with the ASM instance. First thing I did was to check ASM alert.log. I used ADRCI (11g) to see it

adrci&gt; show incident

ADR Home = /u03/app/oracle/diag/asm/+asm/+ASM:
*********************************************************
0 rows fetched

adrci&gt; show alert

ADR Home = /u03/app/oracle/diag/asm/+asm/+ASM:
**********************************************************
Output the results to file: /tmp/alert_9572_1_+ASM_1.ado
"/tmp/alert_9572_1_+ASM_1.ado" 48 lines, 1964 characters
PMON started with pid=2, OS id=3672
DIAG started with pid=4, OS id=3678
<strong>VKTM started with pid=3, OS id=3674
VKTM running at (100ms) precision</strong>
2008-06-24 15:24:12.425000 +05:30
PSP0 started with pid=5, OS id=3680
<strong>DSKM started with pid=6, OS id=3682</strong>
<strong>DIA0 started with pid=7, OS id=3684</strong>
MMAN started with pid=6, OS id=3686
DBW0 started with pid=8, OS id=3689
LGWR started with pid=9, OS id=3691
CKPT started with pid=10, OS id=3694
SMON started with pid=11, OS id=3700
RBAL started with pid=12, OS id=3702
GMON started with pid=13, OS id=3705
ORACLE_BASE from environment = /u03/app/oracle
<strong>Spfile /u03/app/oracle/product/11.1.0/db_1/dbs/spfile+ASM.ora is in old pre-11 format and compatible &gt;= 11.0.0; converting to  new H.A.R.D. compliant format.</strong>

I have highlighted the things which were not present in 10g. According to Docs

DIA0 (diagnosability process 0) (only 0 is currently being used) is responsible for hang detection and deadlock resolution.

VKTM (virtual keeper of time) is responsible for providing a wall-clock time (updated every second) and reference-time counter (updated every 20 ms and available only when running at elevated priority)

These were the definitions from docs. Oracle should have been more generous and also documented following

DIA0 – Does that mean we will have auto SystemState/Hanganalyze generated during hang? Will ORA-60 be handled by this process?

VKTM – What does this mean to us? Will this timer be used in 10046 timing information? Will it ensure Oracle Scheduler run jobs on time 🙂

DSKM – This is still not documented.

Last message kind of indicates that spfile is also made H.A.R.D complaint. So it should take care of corruptions.

I have created more confusion rather than explaining what these processes actually do 🙂 Anyways if you have any information, then it will be really nice if it can be shared…Thanks for Reading !!

11.1.0.6 ASM installation on Solaris fails -II

Some time back, I had written about CSS service not starting in my post 11.1.0.6 ASM installation on Solaris fails -I

After doing some research, I came across Metalink Document

Note:397238.1 – How to Convert init.cssd as a SMF service for Solaris 10

This document talks about using Service Management Facility (SMF) which was introduced in Solaris 10. To configure it we have to download a zip file from the note and copy two files called initcssd to /lib/svc/method/initcssd and copy the second file initcssd.xml to /var/svc/manifest/site and some other steps (as listed in doc)

1) Install Oracle Software on Solaris 10

2) Download files from Note: 397238.1. Once done, modify the files accordingly and copy to the required location.

3) Do the configuration and then enable the service.

<strong># svcadm -v enable initcssd</strong>

After doing this, still service does not start.

# ps -ef|grep css
    root 29137  3793   0 14:54:39 ?           0:00 /bin/sh /lib/svc/method/initcssd run
    root 29188 26874   0 14:54:50 pts/9       0:00 grep css

Note talks about checking the content of content of file /var/opt/oracle/scls_scr/<Your-hostname>/root/cssrun file.

But when I tried to check, I found that directory does not exists.

# cd /var/opt/oracle/scls_scr/
 cd: /var/opt/oracle/scls_scr/: No such file or directory
# cd /u03/app/oracle/product/11.1.0/db_1/bin/
<strong># ./crsctl start crs</strong>
Attempting to start Oracle Clusterware stack
Failure at scls_scr_create with code 1
Segmentation Fault (core dumped)

Actually this directory is created when we run ‘localconfig add’ which will configure the socket files and directories.

<strong># ps -ef|grep css</strong>
    root 29137  3793   0 14:54:39 ?           0:00 /bin/sh /lib/svc/method/initcssd run
# pwd
/u03/app/oracle/product/11.1.0/db_1/bin
<strong># ./localconfig add</strong>
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Configuration for local CSS has been initialized

Cleaning up Network socket directories
Setting up Network socket directories
Adding to inittab
Startup will be queued to init within 30 seconds.
Checking the status of new Oracle init process...
Expecting the CRS daemons to be up within 600 seconds.
Cluster Synchronization Services is active on these nodes.
        testzone2
Cluster Synchronization Services is active on all the nodes.
Oracle CSS service is installed and running under init(1M)
<strong># ps -ef|grep css</strong>
  oracle 29137  3793   0 14:54:39 ?           0:00 /u03/app/oracle/product/11.1.0/db_1/bin/ocssd.bin

Now just for fun, I thought of disabling the SMF and trying the configuration again.

# ./localconfig delete
Stopping Cluster Synchronization Services.
Shutting down the Cluster Synchronization Services daemon.
Shutdown request successfully issued.
Shutdown has begun. The daemons should exit soon.
Cleaning up Network socket directories

Disable the SMF
# svcadm -v disable initcssd
svc:/system/initcssd:default disabled.

Now again add CSS service

# ./localconfig add
Successfully accumulated necessary OCR keys.
Creating OCR keys for user \'root\', privgrp \'root\'..
Operation successful.
Configuration for local CSS has been initialized

Cleaning up Network socket directories
Setting up Network socket directories
Adding to inittab
Startup will be queued to init within 30 seconds.
Checking the status of new Oracle init process...
Expecting the CRS daemons to be up within 600 seconds.

Giving up: Oracle CSS stack appears NOT to be running.
Oracle CSS service would not start as installed
Automatic Storage Management(ASM) cannot be used until Oracle CSS service is started

Enable the initcssd SMF service

<strong># svcadm -v enable initcssd</strong>
svc:/system/initcssd:default enabled.
# ps -ef|grep css
  oracle  2589  3793   1 15:14:31 ?           0:00 /u03/app/oracle/product/11.1.0/db_1/bin/ocssd.bin

Cool!! CSS Service has started again 🙂

Note:397238.1 says that problem occurs only after rebooting the service and not during installation. May be document needs to be updated for Oracle 11g on Solaris 10. I would suggest anyone installing Oracle 11g on Solaris 10 should try using these steps. I will try to add a remark to the note. Apart from that Article is quite good.

ORA- 4031 – A Case Study

Today I will be taking up one ORA-4031 issue we faced on one of our client database. Please note that SQL statements mentioned below have been changed and does not reveal any confidential information.

Our client was facing ORA-4031 on a 9.2 database.This was a new Database which had gone Live recently. Following error messages were recorded in alert log.

ORA-00604: error occurred at recursive SQL level 1
ORA-04031: unable to allocate 4200 bytes of shared memory ("shared pool","select    owner#,name,namespace...","sga heap(1,0)","library cache")

Initial shared pool at time of failure was 160 Mb. After the failure the shared_pool_size was increased to 1 Gb. We did not have any other information and were asked to perform RCA for the same. Luckily we had Statspack configured on the server, so we took a report for the duration when the error occurred.

              Snap Id     Snap Time      Sessions Curs/Sess Comment
            --------- ------------------ -------- --------- -------------------
Begin Snap:     1000 15-Apr-08 06:32:06  #######       4.6
  End Snap:     1001 15-Apr-0806:53:41  #######       4.6
   Elapsed:               21.58 (mins)

Cache Sizes (end)
~~~~~~~~~~~~~~~~~
Buffer Cache:     2,048M      Std Block Size:          8K
Shared Pool Size:       160M          Log Buffer:     32,768K

Load Profile
~~~~~~~~~~~~                Per Second       Per Transaction
                           ---------------       ---------------
    Redo size:              2,459.98              2,911.95
    Logical reads:              6,447.76              7,632.40
    Block changes:                  8.31                  9.83
    Physical reads:              3,642.28              4,311.47
    Physical writes:                  0.80                  0.95
    User calls:                106.51                126.08
    <strong>Parses:                151.48                179.31</strong>
    <strong>Hard parses:                 11.08                 13.12</strong>
    Sorts:                 91.50                108.31
    Logons:                  1.41                  1.67
    Executes:                283.09                335.10
    Transactions:                  0.84

We can clearly see that lot of hard parses were occurring at the time of error.

Library Cache Activity for DB: TESTDB1  Instance: TESTDB1
 Snaps: 1000 -1001-&gt;"Pct Misses"  should be very low

                         Get  Pct        Pin        Pct               Invali-
Namespace           Requests  Miss     Requests     Miss     Reloads  dations
--------------- ------------ ------ -------------- ------ ---------- --------
BODY                   1,564    1.9          2,903   62.9        469        0
CLUSTER                6,827    0.0          9,011    0.1          0        0
INDEX                  9,376   62.8          9,375   63.7          5        0
SQL AREA              77,058   11.0        442,341    5.2      <strong>6,723 </strong>       0
TABLE/PROCEDURE       72,495   10.4        135,173   24.2      <strong>8,291</strong>        0
TRIGGER                2,288   14.9          2,288   49.5        218        0
          -------------------------------------------------------------

Also looking at information from V$LIBRARYCACHE, we can see that there were lot of reloads happening for SQL AREA and TABLE/PROCEDURE Namespace. This gives indication that Shared Pool is not appropriately sized. So we kind of expected this to be a shared pool sizing issue.

But after increasing the shared_pool_size to 1 Gb, we saw that memory for “sql area” was increasing. So as to avoid further ORA-4031, we started flushing the shared pool when the free memory reached to 200M.

NAME                                  BYTES
    -------------------------- ----------------
 <strong>free memory                     597,154,376</strong>  &lt; -large amount of free, was done after flushing shared pool
<strong>sql area                        315,365,096</strong> &lt; - High value
library cache                    82,229,616
miscellaneous                    81,476,336
gcs resources                    39,331,928
gcs shadows                      26,133,184

Whenever I observe high memory usage for Sqlarea, I run following sql statements.

1) 10g

<strong><span style="color: #3366ff;">SQL&gt; select sa.sql_text,sa.version_count ,ss.*from v$sqlarea sa,v$sql_shared_cursor ss where<span> </span>sa.address=ss.address and sa.version_count &gt; 50 order by sa.version_count ;</span></strong>

8i/9i

<strong><span style="color: #3366ff;">select sa.sql_text,sa.version_count ,ss.*from v$sqlarea sa,v$sql_shared_cursor ss<span> </span>where sa.address=ss.KGLHDPAR<span> </span>and sa.version_count &gt; 50<span> </span>order by sa.version_count ;</span></strong>

– This one is to find if child cursors are being generated. To know about child cursors, refer to following Metalink note

Note 296377.1 – Handling and resolving unshared cursors/large version_counts

We did not get any SQL in this category.

2) Use script from Asktom website to find if application is using Bind variables. You can find ,more information by clicking here

This returned quite a few statements

    INSERT INTO EMP (columnname, columnname, columnname, columnname, columnname, columnname, columnname, columnname, columnname, columnname, columnname ,columnname) VALUES ('#', TO_DATE('#','#'), @, '#',
    @, @, @, '#', @, @, @,     @, @, @, @, @)          <strong>1948</strong>

    DELETE FROM EMP WHERE EMPID=@          <strong>1976</strong>

    INSERT INTO EMP (columnname, columnname, columnname, columnname) VALUES (@, @, '#',    @, NULL , @)          <strong>1976</strong>

    DELETE FROM EMP WHERE TYPE=@ AND EMPID=@          <strong>1976</strong>

     INSERT INTO DEPT VALUES (@, '#', '#', TO_DATE('#','#'), SYSDATE)
          <strong>2012</strong>

    DELETE FROM DEPT WHERE DEPTNO=@ AND DEPT_ID=@ AND MGR=@ AND SAL='#'          <strong>2375</strong>

    INSERT INTO SALARY (columnname, columnname, columnname, columnname, columnname, columnname, columnname) VA LUES (@, @, @, '#', '#', @, TO_DATE('#','#'), TO_DATE('#','#'), @, @)          <strong>2377</strong>

We can clearly see that statements not being shared and thus causing the high memory usage in shared pool.

Each sql statement in shared pool will be occupying some memory. If the sql are similar and differ only in literal, then they should be shared as otherwise they will occupy lot of extra space. By not sharing the statements, we are wasting space in shared pool and causing shared pool fragmentation.

Solution again is to use Bind variables or CURSOR_SHARING.

Can ASM DiskGroup Be Renamed?

This was actually a question on Oracle forum which I had replied to. Basically the Poster, wanted to know if he could rename the ASM Diskgroup name by renaming/editing ASM Disk header. He had also mentioned that he had heard about this being done by Oracle for its few customers using kfed.

Answer is NO. It is not possible to rename the diskgroup by editing the ASM disk header. kfed is known to be used for patching ASM disk headers for corruption (only oracle support can do it) and for viewing ASM header contents. Only way to change this by dropping and recreating the diskgroup.

In case you wish to create a new diskgroup with a name say +DG1 which was being used by a Diskgroup which is not mounted (Have some ASM Disk members still in ASM_DISKSTRING path), then you would face following error

    <strong>ORA-15030</strong>: diskgroup name "string" is in use by another diskgroup
    <strong>Cause:</strong> A CREATE DISKGROUP command specified a Diskgroup name that was already assigned to another diskgroup.
    <strong>Action: </strong>Select a different name for the Diskgroup.

In case you wish to create the Diskgroup with same name +DG1 you will be required to clear the ASM disk header using

dd if=/dev/zero of=/dev/raw/raw11 bs=1024 count=100

After this you can recreate the Diskgroup with same name.

One Poster suggested renaming at LUN/Storage level. I believed this to be a destructive idea which could corrupt the Diskgroup. nvengurl replied to this and informed that we read ASM disk header to mount the Diskgroup and thus changing the LUN name/id/path will not solve the issue.

Update: This article is valid for 10g and 11gR1. Since 11gR2 oracle has introduced renamedg utility which can be used to rename diskgroup. I have documented the steps here