Amit Bansal

Em12c:Silent Oracle Management agent Installation

This Post would describe how to install Oracle Management Agent 12c in silent mode. We need to download the agent software which can be done using emcli utility which is present on OMS host.Login to OMS host and execute emcli utility as mentioned below

[oracle@oradbdev04]~/Middleware/oms/bin% emcli login -username=sysman
Enter password 

Login successful

We need to now syncronize EMCLI

[oracle@oradbdev04]~/Middleware/oms/bin% emcli sync
Synchronized successfully

Following command gets list of agent software present on grid control

[oracle@oradbdev04]~/Middleware/oms/bin% emcli get_supported_platforms
Getting list of platforms ...
Check the logs at /home/oracle/Middleware/oms/bin/agent.log
About to access self-update code path to retrieve the platforms list..
Getting Platforms list  ...
-----------------------------------------------
Version = 12.1.0.1.0
 Platform = Linux x86-64
-----------------------------------------------
Platforms list displayed successfully.

Since this is a new Grid installation on Linux x86-64 we get only 1 agent software. If you need to install agent on different platform say Solaris, then you need to download software using self update option.
Note that you will have to pass your My oracle support credentials while configuring self-update.

Since our target host is also Linux x86-64 we can proceed. In below example we have specified that our 12.1.0.1.0 agent software should be placed in /tmp for Linux x86-64 platform

[oracle@oradbdev04]~/Middleware/oms/bin% ./emcli get_agentimage -destination=/tmp -platform="Linux x86-64" -version=12.1.0.1.0
Platform:Linux x86-64
Destination:/tmp
 === Partition Detail ===
Space free : 3 GB
Space required : 1 GB
Check the logs at /tmp/get_agentimage_2012-02-04_11-56-01-AM.log
Setting property ORACLE_HOME to:/home/oracle/Middleware/oms
calling pulloneoffs with arguments:/home/oracle/Middleware/oms/home/oracle/Middleware/oms/sysman/agent/12.1.0.1.0_AgentCore_226.zip12.1.0.1.0linux_x64
Check this logs for more information: /home/oracle/Middleware/oms/sysman/prov/agentpush/logs
[oracle@oradbdev04]~/Middleware/oms/bin% ls -l /tmp/12.1.0.1.0_AgentCore_226.zip
-rw-r----- 1 oracle dba 235679029 Feb  4 11:56 /tmp/12.1.0.1.0_AgentCore_226.zip

You should copy this file to the target host and unzip it.

To perform installation , we need to use agentDeploy.sh script. You can either use agent.rsp in unzipped directory or pass parameters at command line

AGENT_BASE_DIR ->Location where Agent software will be installed
OMS_HOST –>OMS host name
EM_UPLOAD_PORT –>Agent upload port. Script will try both http/https port.
AGENT_REGISTRATION_PASSWORD –>Password for secure registration with OMS

e.g
agentDeploy.sh AGENT_BASE_DIR=/home/oracle/agent12c OMS_HOST=oradbdev04 EM_UPLOAD_PORT=4900 AGENT_REGISTRATION_PASSWORD=oracle123

You will be prompted to run root.sh as below

-bash-3.2$ sudo /home/oracle/agent12c/core/12.1.0.1.0/root.sh
Password:
Finished product-specific root actions.
/etc exist

Creating /etc/oragchomelist file...
Finished product-specific root actions.

Verify following commands are working properly

cd $AGENT_HOME/core/12.1.0.1.0/bin
./emctl status agent
./emctl upload agent

$ ./emctl upload agent
Oracle Enterprise Manager 12c Cloud Control 12.1.0.1.0
Copyright (c) 1996, 2011 Oracle Corporation.  All rights reserved.
---------------------------------------------------------------
EMD upload completed successfully

Next step is to discover the targets. You can either manually add them or use automated discovery of new Oracle targets on machines which already have agent installed. Refer to EM12c:Automated discovery of Targets for automatic discovery and target configuration.

11gR2:Listener Startup Issues

In this blog post I will be discussing listener startup issues faced in 11gR2 RAC. I will be constantly updating this post based on my experiences or any comments on this blog post.

Let’s get started.You will experience following errors while starting listener using srvctl

[oracle@prod01]~% srvctl start listener -n prod01
PRCR-1013 : Failed to start resource ora.LISTENER.lsnr
PRCR-1064 : Failed to start resource ora.LISTENER.lsnr on node prod01
CRS-5016: Process "/oragrid/product/11.2.0.2/bin/lsnrctl" spawned by agent "/oragrid/product/11.2.0.2/bin/oraagent.bin" for action "start" failed: details at "(:CLSN00010:)" in "/home/oragrid/product/11.2.0.2/log/prod01/agent/crsd/oraagent_oracle/oraagent_oracle.log"
CRS-5016: Process "/oragrid/product/11.2.0.2/bin/lsnrctl" spawned by agent "/oragrid/product/11.2.0.2/bin/oraagent.bin" for action "start" failed: details at "(:CLSN00010:)" in "/home/oragrid/product/11.2.0.2/log/prod01/agent/crsd/oraagent_oracle/oraagent_oracle.log"
CRS-2674: Start of 'ora.LISTENER.lsnr' on 'prod01' failed

 

Issue 1: – Incorrect ORACLE_HOME entry in listener.ora

This issue can be verified by attempting to start listener by lsnrctl utility. Please note that you need to use $GRID_HOME/bin/lsnrctl utility to manage listener in 11gR2 RAC

[oracle@prod01]~% lsnrctl start

LSNRCTL for Linux: Version 11.2.0.2.0 - Production on 31-JAN-2012 08:09:52

Copyright (c) 1991, 2010, Oracle.  All rights reserved.

Starting /oragrid/product/11.2.0.2/bin/tnslsnr: please wait...

TNSLSNR for Linux: Version 11.2.0.2.0 - Production
System parameter file is /oragrid/product/11.2.0.2/network/admin/listener.ora
Log messages written to /oracle/diag/tnslsnr/prod01/listener/alert/log.xml
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.160)(PORT=1521)))
TNS-01201: Listener cannot find executable /oracle/product/11.2/bin/oracle for SID orcl01

Listener failed to start. See the error message(s) above..

Above error indicates that it is not able to find ‘oracle’ executable in specified path. But this is not our correct ORACLE_HOME. Checking listener.ora we found that this home is coming from ORACLE_HOME variable

SID_LIST_LISTENER =
  (SID_LIST =
    (SID_DESC =
      (SID_NAME =orcl01 )
      (ORACLE_HOME = /oracle/product/11.2)
    )
    (SID_DESC =
      (SID_NAME = PLSExtProc)
      (ORACLE_HOME = /oracle/product/11.2)
      (PROGRAM = extproc)
    )
  )

Correct ORACLE_HOME is oracle/product/11.2.0.2 which needs to be updated correctly in listener.ora. After adding we were able to start the listener.

Issue 2: CRS resource ora.[node_name].ons is down

While debugging listener startup issue, you found that resource ora.[node_name].ons is not starting. Listener (including SCAN_LISTENER)  is dependent on ora.ons resource
Checking $GRID_HOME/opmn/logs/ons.log[node_name], we see following messages

12/01/12 05:32:48 [ons-listener] Could not get address information for localhost 6100.
12/01/12 05:32:49 [internal] getaddrinfo(localhost, 6100, 1) failed (Name or service not known):
12/01/12 05:33:05 [internal] getaddrinfo(localhost, 6100, 1) failed (Name or service not known):
12/01/12 05:33:15 [internal] getaddrinfo(localhost, 6100, 1) failed (Name or service not known):
12/01/12 05:33:15 [internal] getaddrinfo(localhost, 6100, 1) failed (Name or service not known):
12/01/12 05:33:15 [internal] getaddrinfo(localhost, 6100, 1) failed (Name or service not known):

Issue is that your host is not able to resolve localhost setting.You can verify this by issuing ping localhost command.This issue can be resolved by adding following entry in /etc/hosts or to DNS

127.0.0.1 localhost

Issue 3: VIP and ora.[node_name].ons are not starting on one node

In this issue check that Bcast and Mask settings for Public interface are same on all nodes.

e.g ifconfig eth0
inet addr:192.168.1.4  Bcast:192.168.0.255  Mask:255.255.255.0

In our case we found that second node had Mask settings of 255.255.254.0. Correcting it and restarting interface resolved the issue.

Cluster SSH tool Utility

Many times you come across scenario’s when you wish to open multiple ssh windows and execute same commands.e.g You wish to see alert log for multi-node RAC simultaneously or edit sysctl.conf files for multiple machines.

Cluster SSH utility helps solve this problem as it opens multiple SSH sessions and allows simultaneous control.

In case you are using MAC OS X, you can download from Google Code site . Utility is called csshX.

For Linux you can download from Sourceforge site and utility is called Cluster SSH.

I am using csshX to show demo. Suppose I have 3 hosts host1,host2,host3 I need to use following commands on terminal app

$csshX host1 host2 host3

or

$csshX host[1-3]

As you can see we can specify range using square brackets.  csshX will create an SSH session to each remote host in separate Terminal.app windows. A master window will also be created. All keyboard input in the master will be sent to all the slave windows. Below screenshot displays how windows will look like

I can enter commands in Master (Red Color) window and it will execute same commands in all 3 windows.If you want to execute commands in particular window then go directly  to that window.  In case we need to open 3 sessions for host1 then we execute

$csshx host1+3

11g: Multiple failed login attempt can block New Application connections

In Oracle database 11g, a new feature/enhancement has potential of bringing down the application due to multiple sessions hitting database with wrong password.

As per Bug 7715339 – Logon failures causes “row cache lock” waits – Allow disable of logon delay [ID 7715339.8]

In 11g there is an intentional delay between allowing failed logon attempts to retry. After 3 successive failures a sleep delay is introduced starting
at 3 seconds and extending to 10 seconds max. During each delay the user X row cache lock is held in exclusive mode preventing any concurrent logon attempt as user X (and preventing any other operation which would need the row cache lock for user X).

Let’s simulate this using a test case.I am using a 11.2.0.3 database.Open two or more database server  connections

Connection 1

create user amit identified by amit;
grant connect to amit;
alter user amit profile MONITORING_PROFILE;

MONITORING_PROFILE has failed_login_attempts set to unlimited.

Create two scripts

$cat 1.sh
sqlplus -s amit/test123 <<EOF
EOF
$cat lock_db.sh
#!/bin/bash
for i in {1..1000}
do
nohup sh 1.sh &
done

I have set loop to 1000 which you can increase depending on your process limit. This is test database with process parameter set to 150. Execute script

sh lock_db.sh 

Now if you go to Connection 2, you will see following issues

a) Database will not allow fresh connections for short time as you will get ORA-00020: maximum number of processes (150) exceeded errors.
After that database connections will be possible
b) You cannot connect to AMIT user for considerable time even with correct password. All sessions will be waiting on library cache lock

 select event,count(*) from V$session group by event order by 2;

EVENT								   COUNT(*)
---------------------------------------------------------------- ----------
Streams AQ: waiting for time management or cleanup tasks		  1
VKTM Logical Idle Wait							  1
Space Manager: slave idle wait						  1
SQL*Net message from client						  1
ges remote message							  1
ASM background timer							  1
GCR sleep								  1
smon timer								  1
pmon timer								  1
Streams AQ: qmn coordinator idle wait					  1
asynch descriptor resize						  1
Streams AQ: qmn slave idle wait 					  1
PING									  1
class slave wait							  2
gcs remote message							  2
DIAG idle wait								  2
wait for unread message on broadcast channel				  2
rdbms ipc message							 18
library cache lock							101

select
 distinct
   ses.ksusenum sid, ses.ksuseser serial#, ses.ksuudlna username,KSUSEMNM module,
   ob.kglnaown obj_owner, ob.kglnaobj obj_name
   ,lk.kgllkcnt lck_cnt, lk.kgllkmod lock_mode, lk.kgllkreq lock_req
   , w.state, w.event, w.wait_Time, w.seconds_in_Wait
 from
  x$kgllk lk,  x$kglob ob,x$ksuse ses
  , v$session_wait w
where lk.kgllkhdl in
(select kgllkhdl from x$kgllk where kgllkreq >0 )
and ob.kglhdadr = lk.kgllkhdl
and lk.kgllkuse = ses.addr
and w.sid = ses.indx
order by seconds_in_wait desc
/

      SID    SERIAL# USERNAME	 MODULE 			     OBJ_OWNER	     OBJ_NAME		LCK_CNT  LOCK_MODE   LOCK_REQ STATE		  EVENT 			  WAIT_TIME SECONDS_IN_WAIT
---------- ---------- ---------- ----------------------------------- --------------- --------------- ---------- ---------- ---------- ------------------- ------------------------------ ---------- ---------------
       153	   77		 testbox01		     84 		      0 	 0	    3 WAITING		  library cache lock			  0		  3
	36	  101		 testbox01		     84 		      0 	 0	    3 WAITING		  library cache lock			  0		  3
	16	  137		testbox01		     84 		      0 	 0	    3 WAITING		  library cache lock			  0		  3
       143	  203		 testbox01		     84 		      0 	 0	    3 WAITING		  library cache lock			  0		  3
	26	  261		 testbox01		     84 		      1 	 3	    0 WAITED KNOWN TIME   library cache lock			119		  3

You will notice that seconds_in_wait value will increase . Also ‘alter system kill session’  takes long time to kill session (it seems as if hung). As per bug ,any operation involving row cache lock for that user will not be allowed. I confirmed by trying to reset profile to Default and session was stuck.

Till all these sessions are cleared from database, you will not be allowed to login. This can severy impact applications as fresh connection would not be possible. Pre-existing sessions will work
If you repeat same experiment in 10g, you will not face similar issue.

To disable this behavior i.e no sleep delay set following event

event=”28401 trace name context forever, level 1″

You can set it dynamically too using

alter system set events ‘28401 trace name context forever, level 1’;

To go back to original behavior

alter system set events ‘28401 trace name context off’;

Bug 7715339 mentions that if FAILED_LOGIN_ATTEMPTS is set to some value, then further attempts to log in will then correctly fail immediately with no delay

I tested this and found that this is not entirely true as my script was able to again spawn 101 sessions which were waiting on library cache lock again. Fresh connection with correct password was again stuck. DBA_USERS view was showing that account is locked. Though here the session cleared in quick time as compared to test case involving failed_login_attempts to unlimited.

Oracle Critical Patch Update January 2012

I guess most of us having My Oracle Support have got email from Oracle mentioning that Critical Patch Update January 2012 has been released. Main reason behind this is Security bug found by Infoworld which can make database run out of SCN. You can read the article here

Summary

There is bug 12371955 which causes scn to be incremented at very high rate when we issue begin backup command in 11g database. There is already patch available for this.
But Infoworld seems to have found security flaws which can cause scn to be be raised by say 1 trillion on interconnected database. Jan 2012 CPU patch seems to address this issue.

 

User Sessions stuck on resmgr:cpu quantum wait event

We were experiencing lot of session getting stuck on resmgr:cpu quantum in our database.
In fact at a time we had 70 sessions which were stuck on this wait event and our cpu load average was touching 60

Checking active resource plan, we found that DEFAULT_MAINTENANCE_PLAN was active. As per 11g Docs

In this plan, any sessions in the SYS_GROUP consumer group get priority. (Sessions in this group are sessions created by user accounts SYS and SYSTEM.) Any resource allocation that is unused by sessions in SYS_GROUP is then shared by sessions belonging to the other consumer groups and subplans in the plan. Of that allocation, 25% goes to maintenance tasks, 5% goes to background processes performing diagnostic operations, and 70% goes to user sessions. To reduce or increase resource allocation to the automated maintenance tasks, you make adjustments to DEFAULT_MAINTENANCE_PLAN.

These plans are associated to 11g windows like MONDAY_WINDOW. You can check it using following query

col window_name format a17
col RESOURCE_PLAN format a25
col LAST_START_DATE format a50
col duration format a15
col enabled format a5
select window_name, RESOURCE_PLAN, LAST_START_DATE, DURATION, enabled from DBA_SCHEDULER_WINDOWS;

WINDOW_NAME	  RESOURCE_PLAN 	    LAST_START_DATE				       DURATION        ENABL
----------------- ------------------------- -------------------------------------------------- --------------- -----
MONDAY_WINDOW	  DEFAULT_MAINTENANCE_PLAN  16-JAN-12 10.00.00.007154 PM PST8PDT	       +000 04:00:00   TRUE
TUESDAY_WINDOW	  DEFAULT_MAINTENANCE_PLAN  10-JAN-12 10.00.00.002781 PM PST8PDT	       +000 04:00:00   TRUE
WEDNESDAY_WINDOW  DEFAULT_MAINTENANCE_PLAN  11-JAN-12 10.00.00.008333 PM PST8PDT	       +000 04:00:00   TRUE
THURSDAY_WINDOW   DEFAULT_MAINTENANCE_PLAN  12-JAN-12 10.00.00.011284 PM PST8PDT	       +000 04:00:00   TRUE
FRIDAY_WINDOW	  DEFAULT_MAINTENANCE_PLAN  13-JAN-12 10.00.00.010937 PM PST8PDT	       +000 04:00:00   TRUE
SATURDAY_WINDOW   DEFAULT_MAINTENANCE_PLAN  14-JAN-12 06.00.00.146968 AM PST8PDT	       +000 20:00:00   TRUE
SUNDAY_WINDOW	  DEFAULT_MAINTENANCE_PLAN  15-JAN-12 06.00.00.003916 AM PST8PDT	       +000 20:00:00   TRUE
WEEKNIGHT_WINDOW									       +000 08:00:00   FALSE
WEEKEND_WINDOW										       +002 00:00:00   FALSE

 

You can disable the resource_plan by using following commands

execute dbms_scheduler.set_attribute('MONDAY_WINDOW','RESOURCE_PLAN','');
execute dbms_scheduler.set_attribute('TUESDAY_WINDOW','RESOURCE_PLAN','');
execute dbms_scheduler.set_attribute('WEDNESDAY_WINDOW','RESOURCE_PLAN','');
execute dbms_scheduler.set_attribute('THURSDAY_WINDOW','RESOURCE_PLAN','');
execute dbms_scheduler.set_attribute('FRIDAY_WINDOW','RESOURCE_PLAN','');
execute dbms_scheduler.set_attribute('SATURDAY_WINDOW','RESOURCE_PLAN','');
execute dbms_scheduler.set_attribute('SUNDAY_WINDOW','RESOURCE_PLAN','');

Verify that resource_plan is disabled

select window_name, RESOURCE_PLAN, LAST_START_DATE, DURATION, enabled from DBA_SCHEDULER_WINDOWS;

WINDOW_NAME	  RESOURCE_PLAN 	    LAST_START_DATE				       DURATION        ENABL
----------------- ------------------------- -------------------------------------------------- --------------- -----
MONDAY_WINDOW				    16-JAN-12 10.00.00.007154 PM PST8PDT	       +000 04:00:00   TRUE
TUESDAY_WINDOW				    10-JAN-12 10.00.00.002781 PM PST8PDT	       +000 04:00:00   TRUE
WEDNESDAY_WINDOW			    11-JAN-12 10.00.00.008333 PM PST8PDT	       +000 04:00:00   TRUE
THURSDAY_WINDOW 			    12-JAN-12 10.00.00.011284 PM PST8PDT	       +000 04:00:00   TRUE
FRIDAY_WINDOW				    13-JAN-12 10.00.00.010937 PM PST8PDT	       +000 04:00:00   TRUE
SATURDAY_WINDOW 			    14-JAN-12 06.00.00.146968 AM PST8PDT	       +000 20:00:00   TRUE
SUNDAY_WINDOW				    15-JAN-12 06.00.00.003916 AM PST8PDT	       +000 20:00:00   TRUE
WEEKNIGHT_WINDOW									       +000 08:00:00   FALSE
WEEKEND_WINDOW										       +002 00:00:00   FALSE

I was not concerned about automated tasks taking more cpu as it was already disabled using

execute DBMS_AUTO_TASK_ADMIN.DISABLE;