11gR2

Limiting I/O and CPU resources using 11g Oracle Resource Manager

Recently I was working on using Oracle DBMS_RESOURCE_MANAGER to limit resources usage by read-only queries. This was required to prevent long running ad-hoc and poorly written queries.Instead of writing custom sql /script to kill long running session, we decided to utilize Oracle Database Resource Manager.
We decided using elapsed_time as criteria for cancelling sql.But during tests we found out that any I/O bound query didn’t kill after specified switch_time.
On checking My Oracle Support, we came across Note ID 1485199.1, which discussed the exact situation faced by us.

As per note , starting 11.2 oracle has fixed the Unpublished Bug 8202097 where some waits were included in the active time and could trigger switch_time early. Due to this fix, switch_time applies for the execution time (in CPU seconds) and not for the total elapsed time (which includes wait times also) of the query.

What this means is that any query which is waiting on wait class other then CPU (e.g I/O ) will not be killed based on switch time.Switch_time applies only to time spent on CPU. Waits are not included.

Checking Oracle 11g Documentation, we found that Oracle had introduced new parameter in create_plan_directive named switch_io_megabytes which can be used to cancel sql after specified I/O limit is reached.
We used following code to create resource manager plan with following directives

a)Limit Cpu time to 3600 s
b)Limit Parallelism to 4
c)Limit I/O to 20000Mb (~20G)

begin
sys.dbms_resource_manager.clear_pending_area();
sys.dbms_resource_manager.create_pending_area();
sys.dbms_resource_manager.create_consumer_group(CONSUMER_GROUP=>'CPU_IO_LIMIT_GRP',
COMMENT=>'Consumer group for RO_USER to limit parallelism,I/O and CPU Time');
sys.dbms_resource_manager.set_consumer_group_mapping(attribute => DBMS_RESOURCE_MANAGER.ORACLE_USER,
value => 'RO_USER',consumer_group =>'CPU_IO_LIMIT_GRP');

-- Create resource plan:
sys.dbms_resource_manager.create_plan(PLAN=> 'CPU_IO_LIMIT',COMMENT=>'Cancel Sql plan for RO_USER');

--Need to create plan_directive
--Limiting parallelism to max 4, I/O Limit to 10000Mb and CPU_time to 3600 sec
sys.dbms_resource_manager.create_plan_directive(
PLAN=> 'CPU_IO_LIMIT',
GROUP_OR_SUBPLAN=>'CPU_IO_LIMIT_GRP',
COMMENT=>'Kill statement after exceeding 3600 sec , limit parallelism to max 4 and limit i/o to 20000M ',
PARALLEL_DEGREE_LIMIT_P1 => 4,
SWITCH_GROUP=>'CANCEL_SQL',
SWITCH_TIME=>3600,
SWITCH_IO_MEGABYTES=>20000,
SWITCH_ESTIMATE=>false
);

--Its compulsory to specify directive for OTHER_GROUPS else this will fail
dbms_resource_manager.create_plan_directive(PLAN=> 'CPU_IO_LIMIT',GROUP_OR_SUBPLAN=>'OTHER_GROUPS',CPU_P1=>100);
sys.dbms_resource_manager.validate_pending_area;
sys.dbms_resource_manager.submit_pending_area;
end;
/

--Grant RO_USER to switch group
exec dbms_resource_manager_privs.grant_switch_consumer_group('RO_USER','CPU_IO_LIMIT_GRP',false);
--Set initial group for RO_USER to CPU_IO_LIMIT_GRP
exec dbms_resource_manager.set_initial_consumer_group('RO_USER','CPU_IO_LIMIT_GRP');

To enable resource manager plan

ALTER SYSTEM SET RESOURCE_MANAGER_PLAN ='CPU_IO_LIMIT';

We started query on big table and specified parallel degree as  8

alter session force parallel query parallel 8;

select /*+ full(a)  */ count(*) from big_table a;

From V$px_session we can verify that we requested 8 degree but we are getting only 4 due to resource manager plan directive.

INST_ID Username    QC/Slave   Slave Set  SID    QC SID Requested DOP Actual DOP MODULE                SQL_ID      EVENT             STATUS
---------- ------------ ---------- ---------- ------ ------ ------------- ---------- ------------------------------ ------------- ------------------------- --------
2  - p000    (Slave)    1          390    579        8       4 SQL*Plus                7fgfym909mqmn direct path read        ACTIVE
2    RO_USER    QC                  579    579                 SQL*Plus                7fgfym909mqmn PX Deq: Execute Reply     ACTIVE
2  - p001    (Slave)    1          580    579        8       4 SQL*Plus                7fgfym909mqmn direct path read        ACTIVE
2  - p002    (Slave)    1          772    579        8       4 SQL*Plus                7fgfym909mqmn direct path read        ACTIVE
2  - p003    (Slave)    1          966    579        8       4 SQL*Plus                7fgfym909mqmn direct path read        ACTIVE

I was monitoring the resource usage using v$rsrc_session_info in separate session

SELECT s.sid sess_id, g.name consumer_group,
s.state, s.consumed_cpu_time cpu_time, s.cpu_wait_time, s.queued_time,(s.CURRENT_SMALL_READ_MEGABYTES+s.CURRENT_LARGE_READ_MEGABYTES) read_MB,(s.CURRENT_SMALL_WRITE_MEGABYTES+s.CURRENT_LARGE_WRITE_MEGABYTES) write_mb
FROM v$rsrc_session_info s, v$rsrc_consumer_group g
WHERE s.current_consumer_group_id = g.id and g.name='CPU_IO_LIMIT_GRP';

After some time our original query got cancelled with following error

ERROR at line 1:
ORA-12801: error signaled in parallel query server P001, instance prod02:orcl02 (2)
ORA-56720: I/O data limit exceeded - call aborted

Just before failure , v$rsrc_session_info output looked like this indicating that we reached 20000M limit specified for I/O

SESS_ID CONSUMER_GROUP            STATE     CPU_TIME CPU_WAIT_TIME QUEUED_TIME    READ_MB     WRITE_MB
---------- -------------------------------- ---------- ---------- ------------- ----------- ---------- ----------
1347 CPU_IO_LIMIT_GRP        WAITING           62          0       0         1        0
1154 CPU_IO_LIMIT_GRP        WAITING        38050          0       0     18707        0
774 CPU_IO_LIMIT_GRP        WAITING        37400          0       0     19871        0
582 CPU_IO_LIMIT_GRP        WAITING        36267          0       0     19073        0
10 CPU_IO_LIMIT_GRP        WAITING        41515          0       0     19843        0

You can notice that when we specify parallelism, each process had individual 20000M I/O limit.
Previous query was only read operation and we were wondering if this I/O limit applied to both read/write operation. To confirm this we ran query with 4 table join doing merge cartersian join (This was one of query which used 100G of temp tablespace and prompted us to start this exercise 🙂

To remove the plan, we have to unset resource_manager_plan parameter and then delete it using following code

ALTER SYSTEM SET RESOURCE_MANAGER_PLAN ='';
begin
sys.dbms_resource_manager.clear_pending_area();
sys.dbms_resource_manager.create_pending_area();
sys.dbms_resource_manager.delete_plan_cascade ('CPU_IO_LIMIT');
sys.dbms_resource_manager.submit_pending_area();
end;
/

We recreated plan with 10000M as new I/O limit and ran the query.(We can also use dbms_resource_manager.update_plan_directive to update the settings).

Meanwhile I used new sql to get single /multi block read/write I/O along with totals.

col state for a10
col CONSUMER_GROUP for a20
SELECT s.sid sess_id, g.name consumer_group,
s.state, s.consumed_cpu_time cpu_time, s.cpu_wait_time, s.queued_time,s.CURRENT_SMALL_READ_MEGABYTES S_RIO,s.CURRENT_LARGE_READ_MEGABYTES M_RIO,s.CURRENT_SMALL_WRITE_MEGABYTES S_WIO,s.CURRENT_LARGE_WRITE_MEGABYTES M_WIO,(s.CURRENT_SMALL_READ_MEGABYTES+s.CURRENT_LARGE_READ_MEGABYTES) read_MB,(s.CURRENT_SMALL_WRITE_MEGABYTES+s.CURRENT_LARGE_WRITE_MEGABYTES) write_mb
FROM v$rsrc_session_info s, v$rsrc_consumer_group g
WHERE s.current_consumer_group_id = g.id and g.name='CPU_IO_LIMIT_GRP';

Before failure we had following output from v$rsrc_session_info with 3341M as reads and 6614 writes (total of 9955M ~10000M). It confirms that I/O limit here includes both read/write I/O

SESS_ID CONSUMER_GROUP    STATE         CPU_TIME CPU_WAIT_TIME QUEUED_TIME      S_RIO    M_RIO       S_WIO      M_WIO    READ_MB     WRITE_MB
---------- -------------------- ---------- ---------- ------------- ----------- ---------- ---------- ---------- ---------- ---------- ----------
199 CPU_IO_LIMIT_GRP    WAITING        104786          0          0        233     3108         203       6295      3341         6498

SESS_ID CONSUMER_GROUP    STATE         CPU_TIME CPU_WAIT_TIME QUEUED_TIME      S_RIO    M_RIO       S_WIO      M_WIO    READ_MB     WRITE_MB
---------- -------------------- ---------- ---------- ------------- ----------- ---------- ---------- ---------- ---------- ---------- ----------
199 CPU_IO_LIMIT_GRP    RUNNING        106749          0          0        233     3108         207       6407      3341         6614

One last test was to test for queries exceeding specified CPU Time. To perform this test we altered switch_time to 120 seconds and re-ran the query involving joins.After nearly 5 minutes, our sql got killed with following error

ORA-00040: active time limit exceeded - call aborted
Elapsed: 00:04:47.13

--Resource usage
SESS_ID CONSUMER_GROUP    STATE         CPU_TIME CPU_WAIT_TIME QUEUED_TIME      S_RIO    M_RIO       S_WIO      M_WIO    READ_MB     WRITE_MB
---------- -------------------- ---------- ---------- ------------- ----------- ---------- ---------- ---------- ---------- ---------- ----------
199 CPU_IO_LIMIT_GRP    RUNNING        141184          0          0        236     3108         254       7885      3344         8139

Above query reported 140s as total consumed time even though we had specified 120 s.(Not sure if this is expected behavior as note 1485199.1 mentioned that it does not include any wait) Note that we used switch_group as CANCEL_SQL which cancels the running sql after limit is reached.In case you plan to kill the session, you need to use KILL_SESSION.

You would notice that we used switch_estimate to false . This parameter tells oracle to estimate the execution time before the operation starts.If you use switch_estimate to true  in above code, you will notice that it will kill the session immediately along with error message indicating which I/O limit was reached.

If you set switch_estimate to true and set max_est_exec_time which specifies the maximum execution time (in CPU seconds) allowed for a session. If the optimizer estimates that an operation will take longer than MAX_EST_EXEC_TIME, the operation is not started and ORA-07455 is issued.

You can notice that this is far better approach of enforcing limits rather then writing your own script/triggers. Some of advantages in this case were

a)Both the cases we got pretty informative error indicating that we are excluding specified limits.

b)Gave us control of limiting parallelism

c)Allows us to specify I/O based limits

We didn’t use cpu allocation in this case but you can also limit cpu available for users using mgmt_p1 (cpu_p1 ,etc are deprecated).In case you are planning to implement this in your production database with cancel_sql directive, I would recommend ensuring that it should not be application user else you will start experiencing unwanted results.

11gr2-Formatted crsctl output

If you don’t like crsctl status res –t output then try using below 🙂 Can also download here crsstat11g. I have tried this on linux

crsctl status res |grep -v "^$"|awk -F "=" 'BEGIN {print " "} {printf("%s",NR%4 ? $2"|" : $2"\n")}'|sed -e 's/ *, /,/g' -e 's/, /,/g'|\
 awk -F "|" 'BEGIN { printf "%-40s%-35s%-20s%-50s\n","Resource Name","Resource Type","Target ","State" }{ split ($3,trg,",") split ($4,st,",")}{for (i in trg) {printf "%-40s%-35s%-20s%-50s\n",$1,$2,trg[i],st[i]}}'

Checking column usage information using dbms_stats

Oracle 11gR2 DBMS_STATS introduced useful function report_col_usage to report column usage i.e if column is being used for equality,like predicates. I came to know it while reading Optimizer whitepaper .As per document, DBMS_STATS.REPORT_COL_USAGE reports column usage information and records all the SQL operations the database has processed for a given object.

This information is used by Oracle gather stats job to decide whether to collect histograms and also number of buckets to be used.So if we specify method_opt =>’FOR ALL COLUMS SIZE AUTO’, oracle stats job will make use of this information. A column is a candidate for a histogram if it has been seen in a where clause predicate, e.g., an equality, range, LIKE, etc.Column usage tracking is enabled by default.

Apart from this, I see two more benefits

a)It can be used to see if all indexed column are being used. If not we can get rid of any extra indexes.

b) In case you are using method_opt =>’FOR ALL INDEXED COLUMNS SIZE AUTO’ in your stats collection script, it can tell additional columns which are being used in where clause. This is really important as this method_opt option does not capture statistics on non-indexed columns. Result can be wrong cardinality estimates and choice of wrong join methods. You can read detailed explanation on Greg Rahn’s post

To see this action I am using EMP,DEPT,SALGRADE table. You can find the scripts for creating these tables on scribd .

Since these are newly created tables, there was no column usage stats. Running the function confirms same

SELECT DBMS_STATS.REPORT_COL_USAGE('AMIT','EMP') FROM DUAL;
LEGEND:
 .......
EQ : Used in single table EQuality predicate
 RANGE : Used in single table RANGE predicate
 LIKE : Used in single table LIKE predicate
 NULL : Used in single table is (not) NULL predicate
 EQ_JOIN : Used in EQuality JOIN predicate
 NONEQ_JOIN : Used in NON EQuality JOIN predicate
 FILTER : Used in single table FILTER predicate
 JOIN : Used in JOIN predicate
 GROUP_BY : Used in GROUP BY expression
 ...............................................................................
###############################################################################
COLUMN USAGE REPORT FOR AMIT.EMP
 ................................
###############################################################################

I ran following set of queries once

select * from emp where empno=7782;
 select ename,job,dname from emp,dept where emp.deptno=dept.deptno;
 select e.ename,e.sal,s.grade from emp e,salgrade s where e.sal between s.losal and s.hisal;

Querying again gives following output now

###############################################################################
COLUMN USAGE REPORT FOR AMIT.EMP
 ................................
1. DEPTNO : EQ_JOIN
 2. EMPNO : EQ
 3. SAL : NONEQ_JOIN
 ###############################################################################

Ran two queries with like and range predicates

select * from emp where ename like 'A%';
 select * from emp where sal >3000;

You can see that ENAME column shows that we have like in query predicate and for SAL a range comparison has been made.

###############################################################################
COLUMN USAGE REPORT FOR AMIT.EMP
 ................................
1. DEPTNO : EQ_JOIN
 2. EMPNO : EQ
 3. ENAME : LIKE
 4. SAL : RANGE NONEQ_JOIN
 ###############################################################################

You can reset the column usage information using following procedure

exec dbms_stats.reset_col_usage('AMIT','EMP')

Note that this deletes the recorded column usage information from dictionary which can have impact on Stats job, so be careful while you are deleting it. There are two more procedures which can help to seed column usage information from other database.

SEED_COL_USAGE – This procedure iterates over the SQL statements in the specified SQL tuning set, compiles them and seeds column usage information for the columns that appear in these statements.

Syntax
DBMS_STATS.SEED_COL_USAGE (sqlset_name IN VARCHAR2, owner_name IN VARCHAR2, time_limit IN POSITIVE DEFAULT NULL);

MERGE_COL_USAGE – This procedure merges column usage information from a source database by means of a dblink into the local database. If column usage information already exists for a given table or column MERGE_COL_USAGE will combine both the local and the remote information.

Syntax
DBMS_STATS.MERGE_COL_USAGE (dblink IN VARCHAR2);

Poll on Sql Plan Management

Dominic is conducting poll on SPM and Sql Profiles usage on his website. Link can be found here 

I have been using SPM and SQL Profiles (using coe_xfr_profile.sql) to fix plans for queries  and believe me its very easy and quick way of fixing problems in production database. I have not yet used SPM to baseline everything and only using it on need basis for fixing sql with bad plans. I have been reading about  Automatic SQL tuning advisor in 11g and would be  trying to use it and  see the recommendations proposed by it (will keep accept_sql_profiles to false 🙂 )

ORA-7445 core dump [kokscold()+849] from EM 12c

The first problem reported by my EM 12c is an ORA-7445 error. After checking the alert log following is the exact message:

Sun Feb 19 12:09:41 2012
Errors in file /u01/app/oracle/diag/rdbms/pkldb/pkldb/trace/pkldb_j035_14712.trc (incident=52961):
ORA-07445: exception encountered: core dump [kokscold()+849] [SIGSEGV] [ADDR:0x50] [PC:0x115E16F] [Address not mapped to object] []
Incident details in: /u01/app/oracle/diag/rdbms/pkldb/pkldb/incident/incdir_52961/pkldb_j035_14712_i52961.trc

From the incident trace file:

*** 2012-02-19 12:09:43.016
dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x3, level=3, mask=0x0)
----- Current SQL Statement for this session (sql_id=azmq8z48r7wk4) -----
/* SQL Analyze(117,1) */ MERGE INTO MGMT_ARU_PATCH_RECOMMENDATIONS p USING (SELECT :1  as patch_guid, :2  as PATCH_TYPE, :3  as BUG_ID, :4  as REQUEST_ID, :5  as PLATFORM_ID, :6  as PRODUCT_ID, :7  as RELEASE_ID , :8  as LANGUAGE_ID, :9  as CLASSIFICATION_ID, :10  as PLATFORM, :11  as PRODUCT, :12  as RELEASE, :13  as LANGUAGE, :14  as CLASSIFICATION, :15  as ARU_TARGET_TYPE, :16  as ABSTRACT, :17  as RELEASE_DATE, :18  as LOAD_TIMESTAMP, :19  as TEXT_CONTENT, :20  as SUPPORT_LEVEL, :21  as SUPPORT_LEVEL_ID FROM DUAL) n ON (p.patch_guid = n.patch_guid) WHEN MATCHED THEN UPDATE SET p.PATCH_TYPE = n.PATCH_TYPE, p.BUG_ID = n.BUG_ID, p.REQUEST_ID = n.REQUEST_ID, p.PLATFORM_ID = n.PLATFORM_ID, p.PRODUCT_ID = n.PRODUCT_ID, p.RELEASE_ID = n.RELEASE_ID, p.LANGUAGE_ID = n.LANGUAGE_ID, p.CLASSIFICATION_ID  = n.CLASSIFICATION_ID, p.PLATFORM = n.PLATFORM, p.PRODUCT = n.PRODUCT, p.RELEASE = n.RELEASE, p.LANGUAGE = n.LANGUAGE, p.CLASSIFICATION = n.CLASSIFICATION, p.ARU_TARGET_TYPE = n.ARU_TARGET_TYPE, p.ABSTRACT = n.ABSTRACT, p.RELEASE_DATE = n.RELEASE_DATE,p.LOAD_TIMESTAMP = n.LOAD_TIMESTAMP, p.TEXT_CONTENT = n.TEXT_CONTENT, p.SUPPORT_LEVEL = n.SUPPORT_LEVEL, p.SUPPORT_LEVEL_ID = n.SUPPORT_LEVEL_ID WHEN NOT MATCHED THEN  INSERT (PATCH_GUID, PATCH_TYPE, BUG_ID, REQUEST_ID, PLATFORM_ID, PRODUCT_ID, RELEASE_ID, LANGUAGE_ID, CLASSIFICATION_ID, PLATFORM, PRODUCT, RELEASE, LANGUAGE, CLASSIFICATION, ARU_TARGET_TYPE, ABSTRACT, RELEASE_DATE,LOAD_TIMESTAMP, TEXT_CONTENT, SUPPORT_LEVEL, SUPPORT_LEVEL_ID) VALUES (n.PATCH_GUID, n.PATCH_TYPE, n.BUG_ID, n.REQUEST_ID, n.PLATFORM_ID, n.PRODUCT_ID, n.RELEASE_ID, n.LANGUAGE_ID, n.CLASSIFICATION_ID, n.PLATFORM, n.PRODUCT, n.RELEASE, n.LANGUAGE, n.CLASSIFICATION, n.ARU_TARGET_TYPE, n.ABSTRACT, n.RELEASE_DATE,n.LOAD_TIMESTAMP, n.TEXT_CONTENT, n.SUPPORT_LEVEL, n.SUPPORT_LEVEL_ID)


----- PL/SQL Call Stack -----
  object      line  object
  handle    number  name
0x80bd47c0     11774  package body SYS.DBMS_SQLTUNE_INTERNAL
0x80b50a80         7  SYS.WRI$_ADV_SQLTUNE
0x82d550f8       545  package body SYS.PRVT_ADVISOR
0x82d550f8      2613  package body SYS.PRVT_ADVISOR
0x80bf0da8       241  package body SYS.DBMS_ADVISOR
0x81987268       772  package body SYS.DBMS_SQLTUNE
0x8187d290         4  anonymous block

*** 2012-02-19 12:09:42.839
> (0x115e16f) mov 0x50(%r11),%rdx
  (0x115e173) movzbl 0x1(%r13),%ecx
  (0x115e178) movzbl 0x1(%r15),%r9d
  (0x115e17d) mov 0x8(%rdx),%edx
  (0x115e180) lea 0x10(%r13),%r8

The error seems to be coming from some automated scheduled SQL Tuning Task (Yet to figureout that). Searching at MOS gives reference to bug 9594372, which shows that the problem is with merge statement. It says that applying the patch will fix the issue.

While checking the error from my EM12c, it also gives a search button to search the error at MOS( I have already installed the MOS plug-in) which I found is very good thing as I need not to login to separate MOS windows.

11gR2:Listener Startup Issues

In this blog post I will be discussing listener startup issues faced in 11gR2 RAC. I will be constantly updating this post based on my experiences or any comments on this blog post.

Let’s get started.You will experience following errors while starting listener using srvctl

[oracle@prod01]~% srvctl start listener -n prod01
PRCR-1013 : Failed to start resource ora.LISTENER.lsnr
PRCR-1064 : Failed to start resource ora.LISTENER.lsnr on node prod01
CRS-5016: Process "/oragrid/product/11.2.0.2/bin/lsnrctl" spawned by agent "/oragrid/product/11.2.0.2/bin/oraagent.bin" for action "start" failed: details at "(:CLSN00010:)" in "/home/oragrid/product/11.2.0.2/log/prod01/agent/crsd/oraagent_oracle/oraagent_oracle.log"
CRS-5016: Process "/oragrid/product/11.2.0.2/bin/lsnrctl" spawned by agent "/oragrid/product/11.2.0.2/bin/oraagent.bin" for action "start" failed: details at "(:CLSN00010:)" in "/home/oragrid/product/11.2.0.2/log/prod01/agent/crsd/oraagent_oracle/oraagent_oracle.log"
CRS-2674: Start of 'ora.LISTENER.lsnr' on 'prod01' failed

 

Issue 1: – Incorrect ORACLE_HOME entry in listener.ora

This issue can be verified by attempting to start listener by lsnrctl utility. Please note that you need to use $GRID_HOME/bin/lsnrctl utility to manage listener in 11gR2 RAC

[oracle@prod01]~% lsnrctl start

LSNRCTL for Linux: Version 11.2.0.2.0 - Production on 31-JAN-2012 08:09:52

Copyright (c) 1991, 2010, Oracle.  All rights reserved.

Starting /oragrid/product/11.2.0.2/bin/tnslsnr: please wait...

TNSLSNR for Linux: Version 11.2.0.2.0 - Production
System parameter file is /oragrid/product/11.2.0.2/network/admin/listener.ora
Log messages written to /oracle/diag/tnslsnr/prod01/listener/alert/log.xml
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.160)(PORT=1521)))
TNS-01201: Listener cannot find executable /oracle/product/11.2/bin/oracle for SID orcl01

Listener failed to start. See the error message(s) above..

Above error indicates that it is not able to find ‘oracle’ executable in specified path. But this is not our correct ORACLE_HOME. Checking listener.ora we found that this home is coming from ORACLE_HOME variable

SID_LIST_LISTENER =
  (SID_LIST =
    (SID_DESC =
      (SID_NAME =orcl01 )
      (ORACLE_HOME = /oracle/product/11.2)
    )
    (SID_DESC =
      (SID_NAME = PLSExtProc)
      (ORACLE_HOME = /oracle/product/11.2)
      (PROGRAM = extproc)
    )
  )

Correct ORACLE_HOME is oracle/product/11.2.0.2 which needs to be updated correctly in listener.ora. After adding we were able to start the listener.

Issue 2: CRS resource ora.[node_name].ons is down

While debugging listener startup issue, you found that resource ora.[node_name].ons is not starting. Listener (including SCAN_LISTENER)  is dependent on ora.ons resource
Checking $GRID_HOME/opmn/logs/ons.log[node_name], we see following messages

12/01/12 05:32:48 [ons-listener] Could not get address information for localhost 6100.
12/01/12 05:32:49 [internal] getaddrinfo(localhost, 6100, 1) failed (Name or service not known):
12/01/12 05:33:05 [internal] getaddrinfo(localhost, 6100, 1) failed (Name or service not known):
12/01/12 05:33:15 [internal] getaddrinfo(localhost, 6100, 1) failed (Name or service not known):
12/01/12 05:33:15 [internal] getaddrinfo(localhost, 6100, 1) failed (Name or service not known):
12/01/12 05:33:15 [internal] getaddrinfo(localhost, 6100, 1) failed (Name or service not known):

Issue is that your host is not able to resolve localhost setting.You can verify this by issuing ping localhost command.This issue can be resolved by adding following entry in /etc/hosts or to DNS

127.0.0.1 localhost

Issue 3: VIP and ora.[node_name].ons are not starting on one node

In this issue check that Bcast and Mask settings for Public interface are same on all nodes.

e.g ifconfig eth0
inet addr:192.168.1.4  Bcast:192.168.0.255  Mask:255.255.255.0

In our case we found that second node had Mask settings of 255.255.254.0. Correcting it and restarting interface resolved the issue.