Amit Bansal

ERRORSTACK- Oracle Debugging Event

One more post arising out of my frequent visits to Oracle forum 😉 Basically there was a question on “How to set events for ORA – 1652 errors”

I had replied to the post but Aman asked me if I could provide any notes regarding Errorstack.I didn’t had one so I informed him that I will be posting on my blog ( Cheap Publicity of my blog 🙂 )

Ok, this was the background to it, now coming to the main thing. Errorstack is used as debugging event for following reasons

  • To extract Failing SQL statement
  • To get Oracle Function call (to be used by Oracle) for interpreting cause for the error (e.g ORA 4030, ORA- 4031). It is also used in case of database hang scenarios. Refer to my earlier article on database hang
  • It also contains execution plan in some cases which can be used to diagnose ORA – 1652 errors (Excessive temp usage)

Errorstack can be set at System level and also at Session level.

e.g To set errorstack for ORA – 1652 ,Syntax will be

alter system set events ‘1652 trace name errorstack level 3’;

If suppose, you want to set it at session level, then

alter session set events ‘1652 trace name errorstack level 3’;

You can also use oradebug to take errorstack for particular process (say ospid 1234) as below

sqlplus “/ as sysdba”

oradebug setospid 1234

oradebug unlimit

oradebug dump errorstack 3

There are different levels for the errorstack with level 3 being the highest.

Note: There is no such level as level 10 or 12 which you might
sometimes find on web.

e.g event=’1401 trace name errorstack, level 12′

Extracted from http://www.orafaq.com/faqdbain.htm .
Specifying this will default to level 3.

Let’s see one trace file for error ORA – 942 i.e “table or view does not exist”

SYS>alter session set events ‘942 trace name errorstack level 3’;
Session altered.
SYS>select * from err;

select * from err

*ERROR at line 1:ORA-00942: table or view does not exist

Trace file is generated in user_dump_dest as a user process has failed.

Trace file Header
/u01/app/oracle/admin/sql10g/udump/sql10g_ora_23773.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 – 64bit Production
With the Partitioning, OLAP and Data Mining options
ORACLE_HOME = /u01/app/oracle/oracle/product/10.2.0
System name: SunOS
Node name: test
Release: 5.9
Version: Generic_117171-12
Machine: sun4u
Instance name: sql10g
Redo thread mounted by this instance: 1
Oracle process number: 28
Unix process pid: 23773, image:
oracle@test(TNS V1-V3)

Here we find information about Database version and Process id along with Operating System.

Failing Sql
ksedmp: internal or fatal error
ORA-00942: table or view does not exist
Current SQL statement for this session:
select * from err

Here we can see that faling sql statement is “select * from err”

Call Stack

—– Call Stack Trace —–
calling call entry argument values in hex
location type point (? means dubious value)
——————– ——– ——————– —————————-
ksedmp()+744 CALL ksedst() 000000840 ?
FFFFFFFF7FFF06FC ?
000000000 ?
FFFFFFFF7FFED1F0 ?
FFFFFFFF7FFEBF58 ?
FFFFFFFF7FFEC958 ?
ksddoa()+1148 PTR_CALL 0000000000000000 000106000 ? 106323304 ?
106323000 ? 000106323 ?
000106000 ? 106323304 ?
ksdpcg()+276 CALL ksddoa() 00000000F ? 10631DCD0 ?
105502DA8 ? 10631D990 ?
000000180 ? 10631E564 ?
ksdpec()+200 CALL ksdpcg() 000000000 ? 000000000 ?
000000000 ? 000000000 ?
0000003AE ? 00010558F ?
ksfpec()+160 CALL ksdpec() 0000003AE ? 10631EDB0 ?

Immediately after Failing sql, we will find call stack which are actually Oracle functions, which can be used by Oracle Support/ Developement to find the reason for error. You will find sometimes some metalink notes containing reference to functions so as to match Bugs.

PROCESS STATE DUMP
PROCESS STATE
————-
Process global information:
process: 38a1df2f8, call: 38a2b3218, xact: 0, curses: 38a264b38, usrses: 38a264b38
—————————————-
SO: 38a1df2f8, type: 2, owner: 0, flag: INIT/-/-/0x00
(process) Oracle pid=28, calls cur/top: 38a2b3218/3877b9820, flag: (0) –
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 0
last post received-location: No post
last process to post me: none
last post sent: 0 0 0
last post sent-location: No post
last process posted by me: none
(latch info) wait_event=0 bits=0
Process Group: DEFAULT, pseudo proc: 38a2039c8
O/S info: user: oracle, term: pts/19, ospid: 23773
OSD pid info: Unix process pid: 23773, image: oracle@test (TNS V1-V3)

SO: 38a264b38, type: 4, owner: 38a1df2f8, flag: INIT/-/-/0x00
(session) sid: 71 trans: 0, creator: 38a1df2f8, flag: (41) USR/- BSY/-/-/-/-/-
SO: 38a264b38, type: 4, owner: 38a1df2f8, flag: INIT/-/-/0x00
(session) sid: 71 trans: 0, creator: 38a1df2f8, flag: (41) USR/- BSY/-/-/-/-/-
DID: 0001-001C-0000026A, short-term DID: 0000-0000-00000000
txn branch: 0
oct: 3, prv: 0, sql: 385b129b0, psql: 383567fa0, user: 0/SYS
O/S info: user: oracle, term: pts/19, ospid: 23772, machine: test
program: sqlplus@test (TNS V1-V3)
application name: sqlplus@test (TNS V1-V3), hash value=0
last wait for ‘SQL*Net message from client’ blocking sess=0x0 seq=30 wait_time=24569781 seconds since wait started=23
driver id=62657100, #bytes=1, =0

SO: 387f804d0, type: 53, owner: 38a264b38, flag: INIT/-/-/0x00
LIBRARY OBJECT LOCK: lock=387f804d0 handle=385b129b0 mode=N
call pin=0 session pin=0 hpc=0000 hlc=0000
htl=387f80550[387f2d2e8,3877d6288] htb=387f2d2e8 ssga=387f2cce0
user=38a264b38 session=38a264b38 count=1 flags=[0000] savepoint=0x40
LIBRARY OBJECT HANDLE: handle=385b129b0 mutex=385b12ae0(0)
name=select * from err
hash=737b6fa6ce797a460e47cca17ef84a50 timestamp=04-17-2008 13:56:40

Now this gives the process details like Oracle Process id (pid),OS Process Id (ospid) along with wait event which it was waiting on it. It will also show the memory objects (State Objects SO ) associated with this process.

Systemstate dump which is used to diagnose the database hang contains process state dumps for all the processes present in Database.

Execution Plan

As the execution has failed because of ORA- 942 during semantic check, it will not contain the execution plan.

Changing Database Characterset With Export/Import Utilities

Recently there was a thread on Oracle Forums regarding change of Database characterset from WE8ISO8859P1 to AL32UTF8. User wanted to upgrade his database from 9i to 10g and also change the database characterset. He was looking forward to use either Export/Import utilities or Database Upgrade Assistant for the same. There were few responses to this question with responses from few people (NLS being a dry topic for most of DBA’s 🙂 )

To make this issue simpler, we will keep this problem to only change of Database characterset. Export/Import is considered to be a Straightforward approach but we should understand that this should not be the first step.

As per Globalization Support Guide, there are two steps to change the characterset

1)Data Scanning

2)Data Conversion

Export/Import comes under second step, i.e Data Conversion . Data Scanning is a very important step for any Characterset Change and should not be compromised. Failure to do so can result in data loss or data corruption.

In case you are planning to use Export/Import ,Approach should be to first run csscan.

Csscan utility helps to identify if data is Changeless (Data to be stored in same way in new Characterset), Convertible (Data have different code points in new Characterset and needs to be converted), Truncation (Data will be truncated in new Characterset , so column needs to be modified) and Lossy (Data is not understood by new characterset and will be lost on Conversion).

In case only Changeless and Convertible data is there, you can go ahead and use export/import.

If only changeless data is there, then you need to simply use “Alter database ” statement or csalter (in 10g) to change the characterset.

In case you have lossy data, then you would be required to identify what the data is, correct it and then change the characterset.

You should read the following Metalink Notes and Documentation before attempting to change Database Characterset

Oracle® Database Globalization Support Guide10g Release 2 (10.2)Part Number B14225-02

Note:227338.1 – Character Set Scanner – Frequently Asked Questions

Note 257736.1 Changing the Database Character Set – an extended overview

Note 260192.1 Changing WE8ISO8859P1/WE8ISO8859P15 or WE8MSWIN1252 to (AL32)UTF8

Tweaking _Shared_pool_reserved_min_alloc and ORA – 4031

Here comes one more article dedicated towards diagnosing ORA- 4031 and related parameters. While googling on ORA-4031 error or working on TAR with Oracle Support you will most likely hit this parameter “_Shared_pool_reserved_min_alloc “ and find a advice for altering this parameter to 4000 or so. And many times you go and implement it as this advice has come from Oracle support or you find it on one of your Favourite Oracle DBA’s site.
But how many times have you exactly gone ahead and tried to understand the validity of this suggestion.

Oracle Shared pool is managed by Heap Memory manager and is divided into memory chunks (Can visualize as blocks in tablespace) and the free memory chunks are managed by memory freelists. When a SQL statement is parsed in shared pool , it requires memory for storing execution plan and other internal structures. When the execution is complete the memory is freed and released back to the freelist. This repated use and release of memory causes fragmentation in shared pool.

Now algorithm for memory allocation is such that it will first try to get memory from the shared pool and then if the requested memory size is greater then _Shared_pool_reserved_min_alloc , then it will get the memory from Shared Pool Reserved area. By default this value is set to 4400 bytes. You can use below sql to find the current value for your database

col Parameter format a40
col “Instance Value” format a25
select a.ksppinm “Parameter”, b.ksppstvl “Value” from x$ksppi a, x$ksppcv b where a.indx = b.indx and a.ksppinm like (‘%shared%’);

Suppose you get error like

ORA-04031: unable to allocate 4200 bytes of shared memory (“shared pool”,”unknown object”,”sga heap”,”state objects”)

Here we see failure size is 4200 bytes. In this case if you set _Shared_pool_reserved_min_alloc to 4100 then it will make memory available from Shared_pool_reserved_size and ORA -4031 will be avoided. But note that this is just to delay the ORA – 4031 error (Temporary Relief) and will not resolve the error. To actually resolve the issue you need to have a look at application and see if you are using Bind Variables so that you can reuse the sql statements and avoid shared pool fragmentation. You can read following discussion on AskTom Website to know more about Bind variables

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:7832114438832

Note: – “_Shared_pool_reserved_min_alloc ” cannot be set below value of 4000

Recently while starting up database, I came across following error which was quite cryptic and took lot of time to resolve

ORA-00093: _shared_pool_reserved_min_alloc must be between 4000 and 0

This was contradicting the statement which I made above but I still tried to set it to less value but it errored out. On further investigation it turned out that there was blank space between M and amount of memory specified for one of memory parameters. i.e

DB_CACHE_SIZE=1200 M

Whereas it should have been

DB_CACHE_SIZE=1200M

Hope you see my post before wasting lot of time 🙂

Undo Tablespace Sizing

In this post I will be discussing various types of Undo extents which can be helpful in determining cause of ORA-30036 i.e unable to extend segment by %s in undo tablespace ‘%s’ and ORA-1555 – Snapshot too old Error

While using Automatic Undo management (Set UNDO_MANAGEMENT initialization parameter to AUTO), we specify Undo_Retention parameter to set the time for which we want undo to be available.
You can query Status column of DBA_UNDO_EXTENTS to check the various kind of extents present.

SELECT DISTINCT STATUS, SUM(BYTES), COUNT(*) FROM DBA_UNDO_EXTENTS GROUP BY STATUS;

This would return three values for STATUS field

1)ACTIVE

This means that undo extent is Active and being currently used by a transaction

2)UNEXPIRED

This represents the extents which are required to satisfy the time specified by Undo_retention Initialisation parameter.

3)EXPIRED

These are extents which are not being used by transaction and have crossed the time specified by Undo_retention .

So generally when undo segment space is full then we try to use the Expired extents and if there is no space to either get a new extent or reuse expired extents, then we go and use Unexpired extents. This sometimes lead to ORA-1555 error.

Starting from 10g, a new feature called Retention Guarantee has been introduced. If Retention Guarantee is enabled then you can be assured that you will not be reusing the Unexpired extents but this will mean higher chances of getting ORA-30036 error i.e failing to extend Undo Rollback segment.

So in case you get ORA-30036 , then query DBA_UNDO_EXTENTS to see if there are any expired segments. If there are none and Retention Guarantee is set, then it means that your Undo Tablespace is not properly sized.

You can make use of Undo Advisor available in 10g.

There is one more article on Undo Advisor which can be found Here

You can also manually find required size of Undo by using the below query

SQL> SELECT (UR * (UPS * DBS)) + (DBS * 24) AS “Bytes”
FROM (SELECT value AS UR FROM v$parameter WHERE name = ‘undo_retention’),
(SELECT (SUM(undoblks)/SUM(((end_time – begin_time)*86400))) AS UPS FROM v$undostat),
(select block_size as DBS from dba_tablespaces where tablespace_name=
(select value from v$parameter where name = ‘undo_tablespace’));

Where

(UR) UNDO_RETENTION in seconds
(UPS) Number of undo data blocks generated per second
(DBS) Overhead varies based on extent and file size (db_block_size)

Sign on Letter to Larry Ellison on AWR and ASH Licensing

While surfing across the blog’s, I found this particular link, a compaign which is aimed at getting free access to the ASH and AWR reports. I believe most of us know that we need to pay extra fees for the Diagnostic Tuning Pack which also includes ASH and AWR report. (If you’re not already familiar with these restrictions, you can read about them in the Oracle 10g Licensing Information Manual)

We all have seen how much better these reports are , when it comes to diagnosing general performance problem or some transient problems.

I believe this is a good initiative by Mark Brinsmead and I have already signed onto the letter. You can also join in by signing here.

HW enqueue contention with LOB

Definition
Enqueues are local locks that serialize access to various resources. Enqueue wait event indicates a wait for a lock that is held by another session (or sessions) in an incompatible mode to the requested mode.

HW Enqueue – High Watermark enqueue is acquired when a segment’s high water mark is moved, which will normally happen during a heavy insert operation on the segment.

Detailed explanation
The HW enq is used to move the segment hwm and also to reclaim the space in lob segments.HW enqueue is required for extending segments in locally managed as well as dictionary managed tablespace. The period for which it is held will be shorter for locally managed tablespaces but we do hold it to serialize extensions.

When lobs are deleted or updated, space freed up is not reclaimed immediately. Deleted lob chunks are maintained in the lob index segment for purpose of CR – the committed free space is reclaimed from the lob index when the free space in the lob-base segment is exhausted. Delayed reclaimation is indirectly controlled by PCTVERSION or RETENTION. Movement of free space from lob index to lob-base segment is done under hw enqueue.

When inserts happen, space for new rows are satisfied in the following orderly ways:

a. Allocate free blocks from the segment directly (these blocks are not mapped by the Lobindex).

b. Reclaim freed blocks (from previous updates/ deletes, the ones mapped by the lobindex) that are not needed to satisfy PCTVERSION or RETENTION. In extreme cases, reclaimed blocks even if it means not being able to guarantee RETENTION. This requires HW-enqueue on the lobindex segment.

c. Extend the segment.

To Identify HW – contention
Finding Blockers:

If a session is stuck waiting on an “enqueue” wait use View V$LOCK to find the blocker/s:

SELECT DECODE(request,0,’Holder: ‘,’Waiter: ‘)||sid sess, id1, id2, lmode, request, type
FROM V$LOCK
WHERE (id1, id2, type) IN (SELECT id1, id2, type FROM V$LOCK WHERE request>0)
ORDER BY id1, request;

To view the waiters:

SELECT * FROM v$lock WHERE request > 0;

Description:

* V$LOCK.TYPE is the lock type from P1
* V$LOCK.ID1 is the lock ID1 from P2
* V$LOCK.ID2 is the lock ID2 from P3

In v$lock Id1 is the tablespace number in which the segment (that is having its high water mark changed) resides and ID2 is the tablespace relative DBA of the segment header.

To identify the segment on which HW enqueue wait is being observed, run the following:

select DBMS_UTILITY.DATA_BLOCK_ADDRESS_FILE(ID2) FILE#,
DBMS_UTILITY.DATA_BLOCK_ADDRESS_BLOCK(ID2) BLOCK#
from v$lock
where type = ‘HW’;

select owner, segment_type, segment_name
from dba_extents
where file_id = <>
and <> between block_id and block_id + blocks – 1;

Reducing/ Resolving Contention:

1. Recreate the objects with the freelist groups and preallocate extents using ALTER TABLE…ALLOCATE EXTENT. This will help improve performance

2. Review the freelist and freelist group setting. If it is too high – we may be moving more blocks to freelists and hold segment hw for a longer period of time since it takes time to format all the blocks.

If preallocation does not work, increase the number of free lists. The speed of high water mark movement depends on the number of free lists.

3. If this is during lob space reclamation, review the pctversion setting. Depending on the usage of the blob you can set pctversion to 0 or small values to avoid space reclamation. The default is 10%.

4. Most issues with HW contention on LOBs and space reclaimation have been resolved by applying patch# 6376915 and setting the event 44951.

This fix causes ASSM LOB space to batch up reclaim instead of just reclaiming the requested/required number of LOB chunks. To enable this fix, set event 44951 to the maximum number of chunks that you would like to have reclaimed per attempt. The maximum allowed is 1024. Anything larger becomes 1024. However, if the requested amount of space is larger than the event’s value, the reclaim will be for the requested amount of space.

5. Other considerations:
a. Partitioning the LOB segment.
b. Increasing the chunk size.
c. Rebuild the table or truncate the table.

One such issue which we faced in one of our environment.

AWR Report

Snap Id Snap Time Sessions Cursors/Session
Begin Snap: 129618 27-Jun-12 01:00:10 240 2.1
End Snap: 129631 27-Jun-12 03:10:18 276 2.1
Elapsed: 130.13 (mins)
DB Time: 1,683.99 (mins)

Cache Sizes

Begin End
Buffer Cache: 7,136M 7,136M Std Block Size: 8K
Shared Pool Size: 1,536M 1,536M Log Buffer: 19,456K

Load Profile

Per Second Per Transaction Per Exec Per Call
DB Time(s): 12.9 0.2 0.01 0.00
DB CPU(s): 0.6 0.0 0.00 0.00
Redo size: 1,512,791.1 24,438.7
Logical reads: 42,623.7 688.6
Block changes: 6,243.5 100.9
Physical reads: 2,725.6 44.0
Physical writes: 364.1 5.9
User calls: 4,557.4 73.6
Parses: 1,646.8 26.6
Hard parses: 0.0 0.0
W/A MB processed: 0.6 0.0
Logons: 0.1 0.0
Executes: 1,647.5 26.6
Rollbacks: 0.8 0.0
Transactions: 61.9

Top 5 Timed Foreground Events

Event Waits Time(s) Avg wait (ms) % DB time Wait Class
enq: HW – contention 119,796 57,558 480 56.97 Configuration
direct path read 1,293,467 16,330 13 16.16 User I/O
log file sync 554,365 9,730 18 9.63 Commit
db file sequential read 274,737 5,536 20 5.48 User I/O
DB CPU 4,789 4.74

We can see in above report that this is 2 hour report and 57% of DB time is spent on waits for enq: HW -Contention. Other wait events are also related to I/O and this also indicates slow I/O device (Avg wait for db file sequential read is 20 ms, normally you would see 6 ms). We  move to section for “SQL Ordered by Elapsed Time” to see if we can identify sql

SQL ordered by Elapsed Time

Elapsed Time (s) Executions Elapsed Time per Exec (s) %Total %CPU %IO SQL Id SQL Module SQL Text
63,955.96 958,651 0.07 63.30 0.50 8.76 16w9z88mxsyz3 insert into SCOTT.JBPM4_…
15,336.34 0 15.18 4.81 95.54 7paqx6x20zdy3 JDBC Thin Client SELECT /*+ ORDERED PARALLEL (…

As per AWR report ,sql_id 16w9z88mxsyz3 is taking 64% of total elapsed time for all the sql and this is insert statement with very high execution rate. Digging further into structure of table, we identified that it has 2 clob columns and its residing in tablespace of 4M extent size. We can also look at section “Segments by Physical Writes” ,”Segments by Direct Physical Writes” to identify the segment and notice that they are Lob segments associated with this table. Solution in this case was to hash partition this table (64 partitions) to distribute the inserts into different segments and reduce the number of concurrent requests to allocate extents.