oracle

Application Design and ORA – 4031

Much has been written on designing good application (which uses bind variables) so as to have better performance and avoid shared pool memory issues. Tom Kyte explains this on one of his post consequences of not using bind variables

If you do not use bind variables and you flood the server with
hundreds/thousands of unique queries you will
-run dog slow
-consume a
ton of RAM (and maybe run out)
-not scale beyond a handful of users, if
that
among other really bad side effects.

This is very true and have observed it myself many times. But today we will see one more kind of bad application design which I came across working on a ORA – 4031 issue.

Environment was already using CURSOR_SHARING=SIMILAR (application developers were not ready to change code, so it was Kind of quick fix) to force bind variables but we were still observing ORA – 4031 errors. Query on V$SGASTAT gave us following results

http://askoracledba.wordpress.com/2008/04/27/application-design-and-ora-4031/sgastat/

Out of 4 gb allocated to shared pool, around 2.4 G is being used by sql area and library cache indicating that sql are consuming lot of memory in shared pool.

Whenever I observe high memory usage for Sqlarea, I run following sql statements.

1) select sa.sql_text,sa.version_count ,ss.*from v$sqlarea sa,v$sql_shared_cursor ss where sa.address=ss.KGLHDPAR and sa.version_count > 50 order by sa.version_count ;

– This one is to find if child cursors are being generated.

2) Use script from Asktom website to find if application is using Bind variables. You can find ,more information by clicking here

create table t1 as select sql_text from v$sqlarea;
alter table t1 add sql_text_wo_constants varchar2(1000);
create or replace function
remove_constants( p_query in varchar2 ) return varchar2
as
    l_query long;
    l_char  varchar2(1);
    l_in_quotes boolean default FALSE;
begin
    for i in 1 .. length( p_query )
    loop
        l_char := substr(p_query,i,1);
        if ( l_char = '''' and l_in_quotes )
        then
            l_in_quotes := FALSE;
        elsif ( l_char = '''' and NOT l_in_quotes )
        then
            l_in_quotes := TRUE;
            l_query := l_query  '''#';
        end if;
        if ( NOT l_in_quotes ) then
            l_query := l_query  l_char;
        end if;
    end loop;
l_query := translate( l_query, '0123456789', '@@@@@@@@@@' );
    for i in 0 .. 8 loop
        l_query := replace( l_query, lpad('@',10-i,'@'), '@' );
        l_query := replace( l_query, lpad(' ',10-i,' '), ' ' );
    end loop;
    return upper(l_query);
end;
/
update t1 set sql_text_wo_constants = remove_constants(sql_text);

select sql_text_wo_constants, count(*)
  from t1
 group by sql_text_wo_constants
having count(*) > 100
 order by 2
/

On executing first statement , there were no rows returned. But on executing second query , got following results

SELECT NVL(SUM(FEE),:"SYS_B_@") FROM (SELECT NVL(SUM(FREECALLFEE)+SUM(SALARY)+
SUM(SALARY@)+SUM(FREEINFOFEE),:"SYS_B_@") FEE
FROM SCOTT.A_BL_C@_@_F_@ WHERE MSISDN=:MSISDN UNION ALL
SELECT NVL(SUM(COMM),:"SYS_B_@") FEE FROM SCOTT.T_BL_SM_F_@ WHERE MSISDN=:MSISDN )
150

SELECT NVL(SUM(FEE),:"SYS_B_@") FROM (SELECT NVL(SUM(FREECALLFEE)+
SUM(SALARY)+SUM(SALARY@)+SUM(FREEINFOFEE),:"SYS_B_@") FEE FROM
SCOTT.A_BL_C@_@_G_@ WHERE MSISDN=:MSISDN UNION ALL SELECT NVL(SUM(COMM),:"SYS_B_@") FEE
FROM SCOTT.T_BL_SM_G_@ WHERE MSISDN=:MSISDN )
150

SELECT NVL(SUM(FEE),:"SYS_B_@") FROM (SELECT NVL(SUM(FREECALLFEE)+SUM(SALARY)+
SUM(SALARY@)+SUM(FREEINFOFEE),:"SYS_B_@") FEE FROM
SCOTT.A_BL_C@_@_H_@ WHERE MSISDN=:MSISDN UNION ALL SELECT
NVL(SUM(COMM),:"SYS_B_@") FEE FROM SCOTT.T_BL_SM_H_@ WHERE MSISDN=:MSISDN )

150

TRUNCATE TABLE SCOTT.A_BL_C@_@_L_@
310

TRUNCATE TABLE SCOTT.A_BL_C@_@_M_@
310

TRUNCATE TABLE SCOTT.A_BL_C@_@_P_@
310

Ideally this should not have returned any query as we are forcing bind variables.
But looking closely we find that there around 620tables with name of format SCOTT.A_BL_C@_@_M_@ and SCOTT.A_BL_C@_@_P_@(basically @ will be actually a Numeric or a literal) which is cause for so many versions for sql. Application is running set of 20 queries on these tables , so in total there are 20 X620 queries being run , leading to high usage of shared pool memory.

Apart from that we see that all these tables are being truncated .This actually invalidates the cursors and causes a hard parse along with causing shared pool fragmentation

Based on our findings, we gave following recommendation which could be performed to alleviate (if not resolve) the issue.

– Have the shared pool set to 4 Gb itself
– Turn these 620tables into two paritioned tables. This will avoid multiple sql’s in shared pool. But this can have impact on the time taken for query to run and had to be tested
– Instead of truncate tables, use delete operation so that we do not invalidate the cursor. But again catch here is that we will have more redo generation and if data being loaded uses append operation, size for these tables will grow.

As you see this is basically Application design issue and made a application which is not scalable and also leading to downtime.

Physical Corruption: ORA-1578 -Part 1

ORA-1578 is indication of physical block corruption. Oracle computes an internal checksum before fetching the block from disk, when it is about to fetch the block it reads the block header and compare the checksum stored in it. If the checksum mismatches we get ora-1578 error.
It indicates that a block is corrupted at
hardware level i.e at OS.

This error will always come whenever the corrupted block is read by Oracle.

Reasons for ora-1578:

1. Bad sectors in the Storage Disk Drive.
2. Formatting of blocks At OS level.( Zeroed out disk blocks).
3. some corruption in underlying hardware which is making block unreadable.

How to collect information for ora-1578:

Error Text:

ORA-01578: ORACLE data block corrupted (file # 14, block # 38976)

The block # 38976 shows the corrupted block number and the file # 14 indicates
the reletive file number 14.

1. Get the absolute file number:

To get the absolute file # we will use the following query:

SQL > select file# from v$datafile where rfile#=14;

FILE#
——-
14

here the absolute file# is also 14.

2. Get the segment name and segment type in which this corrupted block resides:

Use the following query:

SQL > SELECT tablespace_name, segment_type, owner, segment_name FROM
dba_extents WHERE file_id = 14 and 38976 between block_id AND block_id + blocks – 1;

— It will give us the segment type (Table, Index, LOB etc.), name of the segment (Table name
Index name etc.), Owner of the segment and the tablespace name of the segment in which corrupted block resides.

SQL > select FILE#,TS#,STATUS,BLOCKS,NAME from v$datafile where FILE#=14;

— From this we will get the tablespace number and the fine name of the datafile in which the corrupted block resides.

SQL> select TS#,NAME from v$tablespace where TS#=;

— It will give us the Name of the tablespace.

Now we have all the information required to perform an action to overcome ora-1578.

I will be providing more on this about solving ora-1578 soon.

Deciding “PROCESSES” parameter value

Many times we observe following error which corresponds to low setting of processes parameter.

$ oerr ora 20

00020, 00000, “maximum number of processes (%s) exceeded”

// *Cause: All process state objects are in use.

// *Action: Increase the value of the PROCESSES initialization parameter.




This is mostly seen in new database installation as the processes parameter is not taken into account.

According to Oracle® Database Reference 10g Release 2 (10.2) Part Number B14237-03

PROCESSES specifies the maximum number of operating system user processes that can simultaneously connect to Oracle. Its value should allow for all background processes such as locks, job queue processes, and parallel execution processes.

The default values of the SESSIONS and TRANSACTIONS parameters are derived from this parameter. Therefore, if you change the value of PROCESSES, you should evaluate whether to adjust the values of those derived parameters.




This nrings us to a question that “Can we set up a large value for this parameter so that user’s are not affected and we do not have to restart the system (PROCESSES is a static parameter) ?”

Answer is NO.



You need to calculate ideal value for the processes parameter by taking care of Default Background processes (Also looking at number of Archiver processes, parallel_max_servers) and total number of user connections.

This is required as otherwise you will be required to change the value of SEMMNS parameter (for UNIX only) i.e Max number of semaphores in system and along with this , it will increase the “Overhead” Memory in SGA.

Find below explanation for the above points.

1)SEMMNS parameter

Each Oracle instance needs to have a set amount of semaphores. The total amount of semaphores required is derived from the ‘processes’ parameter specified inOracle instance init.ora file. As more instances and/or databases are added the OS kernel parameter SEMMNS will need to be adjusted accordingly.

According to Metalink Note:153961.1



In Oracle7 , number of semaphores required by an instance is equal to ‘processes’ parameter in the init.ora for the instance.For Oracle8, Oracle8i, Oracle9i and Oracle10g, number of semaphores required by an instance is equal to 2 times the setting of the ‘processes’ parameter in the init.ora for the instance.

However, Oracle only momentarily grabs 2 X ‘processes’ and then releases half at instance startup.




2)Overhead Memory



Parameters like db_files, open_cursors and processes contribute to the variable part of SGA. This can be seen by issuing “Show SGA” command. You will see that “Variable Size” will be more then sum of “Shared Pool + Large Pool + Java Pool”. This is attibuted to the value of these parameters.

Please note that in case you specify a low value for SGA_MAX_SIZE, you will see Oracle bumping the value to higher value so as to accomodate high value of Overhead memory.

Staring from 10g, Overhead memory is accomodated in shared_pool_size.

e.g If you specify SHARED_POOL_SIZE as 200 MB and your internal overhead is 100 Mb, then your actual shared pool value available to instance is only 100Mb.

You can read Note:351018.1 – Minimum for SHARED_POOL_SIZE Parameter in 10.2 Version for more information.

ERRORSTACK- Oracle Debugging Event

One more post arising out of my frequent visits to Oracle forum 😉 Basically there was a question on “How to set events for ORA – 1652 errors”

I had replied to the post but Aman asked me if I could provide any notes regarding Errorstack.I didn’t had one so I informed him that I will be posting on my blog ( Cheap Publicity of my blog 🙂 )

Ok, this was the background to it, now coming to the main thing. Errorstack is used as debugging event for following reasons

  • To extract Failing SQL statement
  • To get Oracle Function call (to be used by Oracle) for interpreting cause for the error (e.g ORA 4030, ORA- 4031). It is also used in case of database hang scenarios. Refer to my earlier article on database hang
  • It also contains execution plan in some cases which can be used to diagnose ORA – 1652 errors (Excessive temp usage)

Errorstack can be set at System level and also at Session level.

e.g To set errorstack for ORA – 1652 ,Syntax will be

alter system set events ‘1652 trace name errorstack level 3’;

If suppose, you want to set it at session level, then

alter session set events ‘1652 trace name errorstack level 3’;

You can also use oradebug to take errorstack for particular process (say ospid 1234) as below

sqlplus “/ as sysdba”

oradebug setospid 1234

oradebug unlimit

oradebug dump errorstack 3

There are different levels for the errorstack with level 3 being the highest.

Note: There is no such level as level 10 or 12 which you might
sometimes find on web.

e.g event=’1401 trace name errorstack, level 12′

Extracted from http://www.orafaq.com/faqdbain.htm .
Specifying this will default to level 3.

Let’s see one trace file for error ORA – 942 i.e “table or view does not exist”

SYS>alter session set events ‘942 trace name errorstack level 3’;
Session altered.
SYS>select * from err;

select * from err

*ERROR at line 1:ORA-00942: table or view does not exist

Trace file is generated in user_dump_dest as a user process has failed.

Trace file Header
/u01/app/oracle/admin/sql10g/udump/sql10g_ora_23773.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 – 64bit Production
With the Partitioning, OLAP and Data Mining options
ORACLE_HOME = /u01/app/oracle/oracle/product/10.2.0
System name: SunOS
Node name: test
Release: 5.9
Version: Generic_117171-12
Machine: sun4u
Instance name: sql10g
Redo thread mounted by this instance: 1
Oracle process number: 28
Unix process pid: 23773, image:
oracle@test(TNS V1-V3)

Here we find information about Database version and Process id along with Operating System.

Failing Sql
ksedmp: internal or fatal error
ORA-00942: table or view does not exist
Current SQL statement for this session:
select * from err

Here we can see that faling sql statement is “select * from err”

Call Stack

—– Call Stack Trace —–
calling call entry argument values in hex
location type point (? means dubious value)
——————– ——– ——————– —————————-
ksedmp()+744 CALL ksedst() 000000840 ?
FFFFFFFF7FFF06FC ?
000000000 ?
FFFFFFFF7FFED1F0 ?
FFFFFFFF7FFEBF58 ?
FFFFFFFF7FFEC958 ?
ksddoa()+1148 PTR_CALL 0000000000000000 000106000 ? 106323304 ?
106323000 ? 000106323 ?
000106000 ? 106323304 ?
ksdpcg()+276 CALL ksddoa() 00000000F ? 10631DCD0 ?
105502DA8 ? 10631D990 ?
000000180 ? 10631E564 ?
ksdpec()+200 CALL ksdpcg() 000000000 ? 000000000 ?
000000000 ? 000000000 ?
0000003AE ? 00010558F ?
ksfpec()+160 CALL ksdpec() 0000003AE ? 10631EDB0 ?

Immediately after Failing sql, we will find call stack which are actually Oracle functions, which can be used by Oracle Support/ Developement to find the reason for error. You will find sometimes some metalink notes containing reference to functions so as to match Bugs.

PROCESS STATE DUMP
PROCESS STATE
————-
Process global information:
process: 38a1df2f8, call: 38a2b3218, xact: 0, curses: 38a264b38, usrses: 38a264b38
—————————————-
SO: 38a1df2f8, type: 2, owner: 0, flag: INIT/-/-/0x00
(process) Oracle pid=28, calls cur/top: 38a2b3218/3877b9820, flag: (0) –
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 0
last post received-location: No post
last process to post me: none
last post sent: 0 0 0
last post sent-location: No post
last process posted by me: none
(latch info) wait_event=0 bits=0
Process Group: DEFAULT, pseudo proc: 38a2039c8
O/S info: user: oracle, term: pts/19, ospid: 23773
OSD pid info: Unix process pid: 23773, image: oracle@test (TNS V1-V3)

SO: 38a264b38, type: 4, owner: 38a1df2f8, flag: INIT/-/-/0x00
(session) sid: 71 trans: 0, creator: 38a1df2f8, flag: (41) USR/- BSY/-/-/-/-/-
SO: 38a264b38, type: 4, owner: 38a1df2f8, flag: INIT/-/-/0x00
(session) sid: 71 trans: 0, creator: 38a1df2f8, flag: (41) USR/- BSY/-/-/-/-/-
DID: 0001-001C-0000026A, short-term DID: 0000-0000-00000000
txn branch: 0
oct: 3, prv: 0, sql: 385b129b0, psql: 383567fa0, user: 0/SYS
O/S info: user: oracle, term: pts/19, ospid: 23772, machine: test
program: sqlplus@test (TNS V1-V3)
application name: sqlplus@test (TNS V1-V3), hash value=0
last wait for ‘SQL*Net message from client’ blocking sess=0x0 seq=30 wait_time=24569781 seconds since wait started=23
driver id=62657100, #bytes=1, =0

SO: 387f804d0, type: 53, owner: 38a264b38, flag: INIT/-/-/0x00
LIBRARY OBJECT LOCK: lock=387f804d0 handle=385b129b0 mode=N
call pin=0 session pin=0 hpc=0000 hlc=0000
htl=387f80550[387f2d2e8,3877d6288] htb=387f2d2e8 ssga=387f2cce0
user=38a264b38 session=38a264b38 count=1 flags=[0000] savepoint=0x40
LIBRARY OBJECT HANDLE: handle=385b129b0 mutex=385b12ae0(0)
name=select * from err
hash=737b6fa6ce797a460e47cca17ef84a50 timestamp=04-17-2008 13:56:40

Now this gives the process details like Oracle Process id (pid),OS Process Id (ospid) along with wait event which it was waiting on it. It will also show the memory objects (State Objects SO ) associated with this process.

Systemstate dump which is used to diagnose the database hang contains process state dumps for all the processes present in Database.

Execution Plan

As the execution has failed because of ORA- 942 during semantic check, it will not contain the execution plan.

Undo Tablespace Sizing

In this post I will be discussing various types of Undo extents which can be helpful in determining cause of ORA-30036 i.e unable to extend segment by %s in undo tablespace ‘%s’ and ORA-1555 – Snapshot too old Error

While using Automatic Undo management (Set UNDO_MANAGEMENT initialization parameter to AUTO), we specify Undo_Retention parameter to set the time for which we want undo to be available.
You can query Status column of DBA_UNDO_EXTENTS to check the various kind of extents present.

SELECT DISTINCT STATUS, SUM(BYTES), COUNT(*) FROM DBA_UNDO_EXTENTS GROUP BY STATUS;

This would return three values for STATUS field

1)ACTIVE

This means that undo extent is Active and being currently used by a transaction

2)UNEXPIRED

This represents the extents which are required to satisfy the time specified by Undo_retention Initialisation parameter.

3)EXPIRED

These are extents which are not being used by transaction and have crossed the time specified by Undo_retention .

So generally when undo segment space is full then we try to use the Expired extents and if there is no space to either get a new extent or reuse expired extents, then we go and use Unexpired extents. This sometimes lead to ORA-1555 error.

Starting from 10g, a new feature called Retention Guarantee has been introduced. If Retention Guarantee is enabled then you can be assured that you will not be reusing the Unexpired extents but this will mean higher chances of getting ORA-30036 error i.e failing to extend Undo Rollback segment.

So in case you get ORA-30036 , then query DBA_UNDO_EXTENTS to see if there are any expired segments. If there are none and Retention Guarantee is set, then it means that your Undo Tablespace is not properly sized.

You can make use of Undo Advisor available in 10g.

There is one more article on Undo Advisor which can be found Here

You can also manually find required size of Undo by using the below query

SQL> SELECT (UR * (UPS * DBS)) + (DBS * 24) AS “Bytes”
FROM (SELECT value AS UR FROM v$parameter WHERE name = ‘undo_retention’),
(SELECT (SUM(undoblks)/SUM(((end_time – begin_time)*86400))) AS UPS FROM v$undostat),
(select block_size as DBS from dba_tablespaces where tablespace_name=
(select value from v$parameter where name = ‘undo_tablespace’));

Where

(UR) UNDO_RETENTION in seconds
(UPS) Number of undo data blocks generated per second
(DBS) Overhead varies based on extent and file size (db_block_size)

Sign on Letter to Larry Ellison on AWR and ASH Licensing

While surfing across the blog’s, I found this particular link, a compaign which is aimed at getting free access to the ASH and AWR reports. I believe most of us know that we need to pay extra fees for the Diagnostic Tuning Pack which also includes ASH and AWR report. (If you’re not already familiar with these restrictions, you can read about them in the Oracle 10g Licensing Information Manual)

We all have seen how much better these reports are , when it comes to diagnosing general performance problem or some transient problems.

I believe this is a good initiative by Mark Brinsmead and I have already signed onto the letter. You can also join in by signing here.