Writing About Our Experiences With Oracle Databases
Thursday September 9th 2010

Latest Topics

Renaming Diskgroup containing Voting Disk and OCR

Recently encountered a issue where we had to rename a diskgroup for 11.2 RAC containing OCR and Voting Disk.At this [Read More]

Effect Of Multiple SHMMAX Settings

Last week I saw a warning message at database startup time saying: WARNING: EINVAL creating segment of size [Read More]

Finding blogging tough? Just copy and do find and replace

Finding blogging tough? Just copy and do find and replace

Well this is motto followed by Guenadi Jilevski for his blog http://gjilevski.wordpress.com and he seems to get result [Read More]

Physical Corruption: ORA-1578 part 3

Recently we encountered a block corruption issue in 10g database on Linux x86 64 bit and using ASM for storing database [Read More]

Cloning RAC Database with RMAN fails with error

Steps for cloning a RAC database  with RMAN is similar to cloning a single instance database. But while using rman [Read More]

‘Metric execution timed out’ error reported by Grid

We use 10.2 Enterprise Manager Grid control  to monitor our oracle database environments . Apart from the normal metrics provided by oracle we also use host based and sql user defined metrics which allows you to monitor almost anything . Coming back to the main issue, following error was reported from one of the monitored target (Non Prod server)

PROD Apr 30, 2010 8: Status
Time: May 30, 2010 2:28:14 AM PDT
Sev: Metric Error Start
Message: Metric evaluation error start - Metric execution timed out
in 360 seconds

Checking the $AGENT_HOME/sysman/log/emagent.trc, same messages were recorded in the trace file

2010-05-30 02:39:11,585 Thread-4045400992 WARN  scheduler: oracle_database:orcli.xx.xx.xx_orcli1:Response is due to run again but is still running from last schedule
2010-05-30 02:39:12,280 Thread-3981994912 ERROR fetchlets.oslinetok: Metric execution timed out in 600 seconds
2010-05-30 02:39:13,279 Thread-4088384416 ERROR fetchlets.oslinetok: Metric execution timed out in 600 seconds

I used emctl status agent but it did not report any pending files. My first step was to  restart the agent but it didn’t come up properly after restart

[oracle@prod1]~% emctl status agent
Oracle Enterprise Manager 10g Release 4 Grid Control 10.2.0.4.0.
Copyright (c) 1996, 2007 Oracle Corporation.  All rights reserved.
---------------------------------------------------------------
Agent Version     : 10.2.0.4.0
OMS Version       : 10.2.0.4.0
Protocol Version  : 10.2.0.4.0
Agent Home        : /oracle/product/agent10g
Agent binaries    : /oracle/product/agent10g
Agent Process ID  : 1025
Parent Process ID : 1009
---------------------------------------------------------------
Agent is Running but Not Ready

Again looking at emagent.trc file, following error was observed

2010-05-30 03:19:30,238 Thread-4092558240 ERROR TargetManager: TIMEOUT when compute dynamic properties for target orcli.xx.xx.xx_orcli1

At this moment I enabled the blackout for this host as I didn’t wanted too many alerts for this server. On checking MOS aka metalink, I found a note recommending altering values of parameters in agent configuration file which controls the time limit for agent to make connection to databse. But on checking found those parameter’s were already set to specified value (Don’t remember the MOS article now, anyways it was not relevant in our case).
Next step was that I executed top command to see server usage

[oracle@prod1]~/product/agent10g/sysman/config% top

top - 03:33:00 up 298 days,  8:57,  3 users,  load average: 1.10, 1.10, 1.24
Tasks: 333 total,   2 running, 331 sleeping,   0 stopped,   0 zombie
Cpu(s): 13.6% us,  1.2% sy,  1.0% ni, 83.5% id,  0.7% wa,  0.0% hi,  0.0% si
Mem:  32939820k total, 31860120k used,  1079700k free,   655108k buffers
Swap: 12578884k total,   425388k used, 12153496k free, 22205928k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3042 oracle    25   0 50780  13m 6300 R 99.7  0.0   2780:40 tnslsnr
 8588 root      25  10 77084  10m 1228 S  8.3  0.0 594:49.26 sec.pl
19660 oracle    15   0 5978m 1.4g 1.4g S  0.7  4.6  81:12.82 oracle
19662 oracle    -2   0 5978m 1.8g 1.8g S  0.7  5.7  28:26.12 oracle
19666 oracle    -2   0 5978m 1.8g 1.8g S  0.7  5.7  29:23.61 oracle
 8402 root      15   0 21968 1296  876 S  0.3  0.0   6:35.15 sshd
 8857 root      34  19     0    0    0 S  0.3  0.0 380:06.93 kipmi0
 9413 oracle    15   0 3154m 103m  96m S  0.3  0.3  19:11.10 oracle

We can see that tnslsnr is consuming close to 100% cpu. I tried lsnrctl status command which hung. Then I tried pinging to the listener service using tnsping which also timed out.   I restarted the listener by killing it and starting with srvctl

[oracle@prod1]~% kill -9 3042
[oracle@prod1]~% ps -ef|grep tns
oracle   15600 30213  0 03:34 pts/1    00:00:00 grep tns
[oracle@prod1]~% srvctl start listener -n prod1
[oracle@prod1]~% ps -ef|grep tns
oracle   16022     1  0 03:35 ?        00:00:00 /oracle/product/10.2/bin/tnslsnr LISTENER_prod1 -inherit
oracle   16078 30213  0 03:35 pts/1    00:00:00 grep tns

After this agent came up normally. This co-relates with timeout issue reported by agent while making connection to database. I think the correct approach to the problem should have been checking the listener connections instead of restarting agent.

Coming back to the problem, agent was up and running now but  Grid Console reported agent as offline and emctl upload agent command failed . We again look at emagent.trc which recorded following errors

2010-05-30 03:40:31,050 Thread-4136642240 WARN  recvlets.aq: Deferred nmevqd_refreshState for [orcli.xx.xx.xx_orcli1,oracle_database]

This issue was caused due to blackout .  MOS note “Agent crashes with accumulation of blackouts and recvlets.aq: Failed to refreshState [ID 740000.1] ” describes the issue.

Note asks to stop the agent,then removing the blackouts.xml file, issuing emctl clearstate agent command and finally restarting the agent. Please find below commands for reference

$AGENT_HOME/bin/emctl stop agent

rm $AGENT_HOME/sysman/emd/blackouts.xml

$AGENT_HOME/bin/emctl clearstate agent

$AGENT_HOME/bin/emctl start agent

After performing this step, ‘emctl upload agent’ succeeded and grid started monitoring the target normally. Note that in case you are using clustered agent, log files and blackouts.xml file will be found under $AGENT_HOME/<node_name>/sysman/

  • Share/Bookmark
Related Tags: , ,

Leave a Comment

Recent Comments

Lalit had this to say

Nice article. keep it up buddy. Read the post

Renaming Diskgroup containing Voting Disk/OCR | AskDba.org Weblog had this to say

had done similar activity on 11.2 Single instance having database files (see this post) using renamedg. We tried using Read the post

csk had this to say

In my case the adding Disk to an existing Disk Group failed as the disk was not visible from other RAC node.Now the Read the post

Saurabh Sood had this to say

Hi CSK, In this case, the information is already written into disk header, and to reuse the disk the only option i s Read the post

csk had this to say

Yeah. I have opened SR with Oracle. Once I get solution , I will provide it here. Read the post