Performance

V$SQL_SHARED_CURSOR in 11.2.0.2

Oracle 11.2.0.2 has introduced new column “REASON” to V$SQL_SHARED_CURSORS. I came across this extremely useful column while debugging multiple child cursor issue (11.2.0.2 seems to have too many bugs which can cause multiple child cursors).

As per documentation , REASON column has been introduced in 11.2.0.2

REASON - CLOB	Child number, id, and reason the cursor is not shared. The content of this column is structured using XML.

Reason column is helpful as we can easily interpret the reason why child cursors are not being shared. e.g

<ChildNode>
<ChildNumber>151</ChildNumber>
<ID>9</ID>
<reason>PQ Slave mismatch(5)</reason>
<size>2x4</size>
<ctxpq_StmtId_pqctx>0</ctxpq_StmtId_pqctx>
<fxu_kxfxutqidb>2422463</fxu_kxfxutqidb>
</ChildNode>
<ChildNode><ChildNumber>151</ChildNumber>
<ID>3</ID>
<reason>Optimizer mismatch(2)</reason>
<size>4x4</size>
<parallel_query_default_dop>16</parallel_query_default_dop>
<kxfr_Default_DOP>80</kxfr_Default_DOP>
<isParallel>1</isParallel>
<Need_Default_Dop>0</Need_Default_Dop>
</ChildNode>
<ChildNode>
<ChildNumber>151</ChildNumber>
<ID>9</ID><reason>PQ Slave mismatch(5)</reason><size>2x4</size><ctxpq_StmtId_pqctx
>0</ctxpq_StmtId_pqctx><fxu_kxfxutqidb>1433944</fxu_kxfxutqidb></ChildNode>

We can verify the same thing by checking PQ_SLAVE_MISMATCH column

select sql_id,PQ_SLAVE_MISMATCH,OPTIMIZER_MODE_MISMATCH from  V$SQL_SHARED_CURSOR where sql_id='b9uz0akdcx58q' and child_number=151;

SQL_ID	      P O
------------- - -
b9uz0akdcx58q Y N

There is one more column PURGED_CURSOR which is new (I guess it came from 11.2.0.1) which tells if the cursor has been purged using dbms_shared_pool.purge package.

Oracle Database Appliance Introduced

Oracle has introduced it new product: ODA (Oracle Database Appliance)

I was not able to join Oracle President Mark Hurd’s webcast where he supposed to announce a new Oracle product[It was at 1AM my time and I was sleeping], now viewing this webcast HERE

Kerry Osborne has shared information about this product, click link to see it, here you will see some good information about ODA.

Another new thing to talk about 🙂

 

Performance Management Guide on AIX

While trying to find the amount of physical memory used by oracle process on AIX, I got reference of a document from Metalink:

Performance Management Guide

It tell us about which process is using how much memory and how to interpret the output of commands like: vmstat, svmon, ps on AIX.

Also, to get more information on AIX parameter like: MAXPERM, MINPERM click here

Though I have not explored the complete guide yet, but found it very good to start with.

ORA-01722 with Full Table Scan

My application developers approached me with an issue which is very unique to me. They were complaining about a query which was failing with ORA-01722 “invalid number” after an upgrade to 11.1.0.7 from 10.2.0.4. The syntax of the query is like:

select max(a) from t1 where c1<>'abc' and c2=12345 and c3='Y' and c4='xyz';

This query worked fine in 10204 and was also working fine in another, upgraded, 11.1.0.7 database.

All the columns i.e C1,C2,C3 & C4 are varchar 2(20) .

I ran this query with single quotes around column C2 as:

select max(a) from t1 where c1<>'abc' and c2='12345' and c3='Y' and c4='xyz';

and it worked fine but without single quotes it failed again with same error.

I checked the explain plan of the query and it was doing a “Full Table Scan” on Table T1. Then I opened another 11.1.0.7 database where the same query is working fine and found that there is an index on columns C1,C2,C3 & C4 and the table T1 was getting accessed by Index-Range scan.

Now coming back to the failing 11.1.0.7 database, index on column C4 was missing. After creating index on column C4 the query started to work fine at failing instance.

I am not sure how the absence of an index can cause this issue? Why VARCHAR2 cannot recognize a value without quotes when doing a Full Table Scan?

Your comments are always welcome. Please let us know your views on this.

October 2010 Blogroll Report

It’s been time since we saw log-buffer edition with last being published on 1st October 2010 and  Coskan’s weekly blogroll report

I found them really useful to read top blog articles for week. So I decided to tag few articles posted in October 2010 which I  had read and found useful. I am subscribed to orana.info so most of articles are already aggregated there (barring few). Unlike Pythian’s log buffer, I will be concentrating only on Oracle Database Related posts. Lets start

11.2.0.2 patchset introduced lot of new features in Grid Infrastructure. Martin Bach discusses Redundant Interconnect feature introduced in 11.2.0.2 in his following post

Then there is series on setting up Stretched RAC (with 4 parts published till now) from him

Julian Dyke has started his blog and have posted quite a number of articles related to 11.2.0.2
He discussed issue related to Multicasting (required to enable HAIP interconnect feature) in below post

Julian also published test program to check if multicasting is enabled on your box

Ritesh has posted a handy script to map ASM devices on EMC along with Disk number along with a  tweak in max sector size configuration on RHEL5 which can boost ASM Performance.Read more below


Doug writes about a issue related to SQL*Net events and methodology used by him to debug ‘SQL*Net more data from client’ wait event

tkprof has undergone a suprising change in 11.2.0.2. Read Christian’s article about tkprof changes in 11.2.0.2

Another posting from Christian highlights changes related to Deferred Segment creation introduced in 11.2.0.2 in following post


Jarneil writes post about SQL Plan Management in 11g and discusses a case when adding index to table is not picked automatically by CBO

How do you answer question “What is difference between Nested Loop and Hash Joins“. Tanel Poder says “Hash joins can not look up rows from the inner (probed) row source based on values retrieved from the outer (driving) row source, nested loops can.”
Read on below article (make sure to check comments section to know what Jonathan Lewis and Christo kutrovsky had to say)


Check out article on Reading parallel execution with Bloom pruning by Greg Rahn


In 11.2 you can query bind values of running query instead of dumping using errorstack. Tanel Poder gives a example describing with example


Dominic talks about NLS (I know many of you would be scared but believe me you can’t avoid it). Read about using NLS_LENGTH_SEMANTICS

Kellyn troubleshoots ora-600 on RMAN and is able to resolve the bug without applying patch.

Meanwhile she is also trying out setting up wordpress blog on dbakevlar.com (which is so much better than blogspot)

Craig Shallahamer explains correlation between database commit time and log file sync time

Do you have export scripts which you are scared to migrate to datapump. Probably Legacy mode of datapump (11gR2 feature) can help you. Aman Sharma explains the usage in below post
Happy Reading!

Sqlplus connection on AIX taking too long

Recently we faced a performance issue, where in sqlplus connection from application server A (AIX 5.3) to Database server DB (different physical server,10gR2 on AIX 5.3) was taking nearly 1-2 minutes.  We did some quick check’s to narrow down the problematic area.i.e Is it network or database server or some issue with application server?

1)Vmstat output reported Normal Run queue’s/memory utilisation/idle cpu on Application server A and Database Server DB . Rules out resource issue.

2) Connections from other application server B was taking normal time. Rules out Database server issue

3) If incorrect password was entered in connection details, second attempt resulted in normal quick response time. Indicates some network issues.

4) System Admin checked for network issues with traceroute command and reported normal results.

Even though SA confirmed that there was no network issue, we were sure that this has to do something with DNS/Network. As we had identical environment from where connection was working fine , so we initiated tracing of session using truss

<span style="font-family: verdana, geneva;"><strong>truss -aeo /tmp/sqlplus_timing.txt sqlplus username/passwd@DB</strong></span>

Issuing a tail command on trace for Application Server A reported following

<strong>open("/etc/netsvc.conf", O_RDONLY)              Err#13 EACCES
open("/etc/irs.conf", O_RDONLY)                 Err#2  ENOENT
getdomainname(0x09001000A00A0E98, 1024)         = 0</strong>
__libc_sbrk(0x0000000000010020)                 = 0x000000001021CBE0
getdomainname(0x09001000A00A0E98, 1024)         = 0
open("/etc/hesiod.conf", O_RDONLY)              Err#2  ENOENT
getdomainname(0x09001000A00A0E98, 1024)         = 0
getdomainname(0x09001000A00A0E98, 1024)         = 0
getdomainname(0x09001000A00A0E98, 1024)         = 0
socket(2, 2, 0)                                 = 9
getsockopt(9, 65535, 4104, 0x0FFFFFFFFFFF5B74, 0x0FFFFFFFFFFF5B70) = 0
connext(9, 0x09001000A0015B50, 16)              = 0
send(9, 0x0FFFFFFFFFFF6A80, 32, 0)              = 32
_poll(0x0FFFFFFFFFFF5C20, 1, 5000)              = 1
nrecvfrom(9, 0x0FFFFFFFFFFF7DB0, 1024, 0, 0x0FFFFFFFFFFF63E0, 0x0FFFFFFFFFFF5C08) = 78
send(9, 0x0FFFFFFFFFFF6A80, 22, 0)              = 22
_poll(0x0FFFFFFFFFFF5C20, 1, 5000)              = 1
nrecvfrom(9, 0x0FFFFFFFFFFF7DB0, 1024, 0, 0x0FFFFFFFFFFF63E0, 0x0FFFFFFFFFFF5C08) = 22
close(9)                                        = 0
socket(2, 2, 0)                                 = 9
sendto(9, 0x0FFFFFFFFFFF6A80, 22, 0, 0x09001000A0015B60, 16) = 22
_poll(0x0FFFFFFFFFFF5C20, 1, 5000) (sleeping...)
_poll(0x0FFFFFFFFFFF5C20, 1, 5000)              = 0
close(9)                                        = 0
socket(2, 2, 0)                                 = 9
sendto(9, 0x0FFFFFFFFFFF6A80, 22, 0, 0x09001000A0015B60, 16) = 22
_poll(0x0FFFFFFFFFFF5C20, 1, 5000) (sleeping...)
_poll(0x0FFFFFFFFFFF5C20, 1, 5000)              = 0
close(9)                                        = 0
socket(2, 2, 0)                                 = 9
sendto(9, 0x0FFFFFFFFFFF6A80, 22, 0, 0x09001000A0015B60, 16) = 22
_poll(0x0FFFFFFFFFFF5C20, 1, 10000) <strong>(sleeping...)</strong>
_poll(0x0FFFFFFFFFFF5C20, 1, 10000)             = 0
close(9)                                        = 0

We can see lot of sleeping messages for _poll system call.  We did not find these wait’s in truss output of Application Server B . Going up in truss output file, we see EACCESS (permission issue) on /etc/netsvc.conf and ENONT (file/directory not present) on /etc/irs.conf file.  On application server B, there were no EACCESS errors on /etc/netsvc.conf but ENONT errors were reported for /etc/irs.conf . Look’s like we need to check etc/netsvc.conf file permissions.

Comparing file permission’s, we noticed that permission on /etc/netsvc.conf had changed on Server A and  root user alone had read permission on file. Granting read permission on /etc/netsvc.conf to other’s resolved the issue immediately.

As per AIX Doc’s, /etc/netsvc.conf file is used to specify the ordering of name resolution for the sendmail command, gethostbyname subroutine, gethostaddr subroutine, and gethostent subroutine and alias resolution for the sendmail command. It is also used to override the default order and the order given in the /etc/irs.conf file by creating the /etc/netsvc.conf configuration file and specifying the desired ordering.

e.g If the resolver cannot find the name in the /etc/hosts file and you want to the resolver to use NIS, enter:

hosts = local , nis

If the resolver cannot find the name in the /etc/hosts file and you want to the resolver to use NIS, enter:
hosts = local , nis

I had usually seen /etc/resolv.conf file to specify name resolution.  AIX Doc’s for name resolution specifies following process for name resolution

To resolve a name in a domain network, the resolver routine first queries the domain name server database, which might be local if the host is a domain name server or on a foreign host. Name servers translate domain names into Internet addresses. The group of names for which a name server is responsible is its zone of authority. If the resolver routine is using a remote name server, the routine uses the domain name protocol (DOMAIN) to query for the mapping. To resolve a name in a flat network, the resolver routine checks for an entry in the local /etc/hosts file. When NIS or NIS+ is used, the /etc/hosts file on the master server is checked.

By default, resolver routines attempt to resolve names using the above resources. BIND/DNS is tried first. If the /etc/resolv.conf file does not exist or if BIND/DNS could not find the name, NIS is queried if it is running. NIS is authoritative over the local /etc/hosts, so the search ends here if it is running. If NIS is not running, then the local /etc/hosts file is searched. If none of these services can find the name, then the resolver routines return with HOST_NOT_FOUND. If all of the services are unavailable, then the resolver routines return with SERVICE_UNAVAILABLE.

The default order described above can be overwritten by creating the /etc/irs.conf configuration file and specifying the desired order. Also, both the default and /etc/irs.conf orderings can be overwritten with the environment variable, NSORDER. If either the /etc/irs.conf file or NSORDER environment variable are defined, then at least one value must be specified along with the option.

As mentioned, /etc/irs.conf is used to modify the default order and /etc/netsvc.conf can be used to override the order given in /etc/irs.conf. I couldn’t find any reference explaining the time limit used by resolver, but seem’s like after checking for some limited time, it fall’s back on default order of name resolution.