Limiting I/O and CPU resources using 11g Oracle Resource Manager

Recently I was working on using Oracle DBMS_RESOURCE_MANAGER to limit resources usage by read-only queries. This was required to prevent long running ad-hoc and poorly written queries.Instead of writing custom sql /script to kill long running session, we decided to utilize Oracle Database Resource Manager.
We decided using elapsed_time as criteria for cancelling sql.But during tests we found out that any I/O bound query didn’t kill after specified switch_time.
On checking My Oracle Support, we came across Note ID 1485199.1, which discussed the exact situation faced by us.

As per note , starting 11.2 oracle has fixed the Unpublished Bug 8202097 where some waits were included in the active time and could trigger switch_time early. Due to this fix, switch_time applies for the execution time (in CPU seconds) and not for the total elapsed time (which includes wait times also) of the query.

What this means is that any query which is waiting on wait class other then CPU (e.g I/O ) will not be killed based on switch time.Switch_time applies only to time spent on CPU. Waits are not included.

Checking Oracle 11g Documentation, we found that Oracle had introduced new parameter in create_plan_directive named switch_io_megabytes which can be used to cancel sql after specified I/O limit is reached.
We used following code to create resource manager plan with following directives

a)Limit Cpu time to 3600 s
b)Limit Parallelism to 4
c)Limit I/O to 20000Mb (~20G)

begin
sys.dbms_resource_manager.clear_pending_area();
sys.dbms_resource_manager.create_pending_area();
sys.dbms_resource_manager.create_consumer_group(CONSUMER_GROUP=>'CPU_IO_LIMIT_GRP',
COMMENT=>'Consumer group for RO_USER to limit parallelism,I/O and CPU Time');
sys.dbms_resource_manager.set_consumer_group_mapping(attribute => DBMS_RESOURCE_MANAGER.ORACLE_USER,
value => 'RO_USER',consumer_group =>'CPU_IO_LIMIT_GRP');

-- Create resource plan:
sys.dbms_resource_manager.create_plan(PLAN=> 'CPU_IO_LIMIT',COMMENT=>'Cancel Sql plan for RO_USER');

--Need to create plan_directive
--Limiting parallelism to max 4, I/O Limit to 10000Mb and CPU_time to 3600 sec
sys.dbms_resource_manager.create_plan_directive(
PLAN=> 'CPU_IO_LIMIT',
GROUP_OR_SUBPLAN=>'CPU_IO_LIMIT_GRP',
COMMENT=>'Kill statement after exceeding 3600 sec , limit parallelism to max 4 and limit i/o to 20000M ',
PARALLEL_DEGREE_LIMIT_P1 => 4,
SWITCH_GROUP=>'CANCEL_SQL',
SWITCH_TIME=>3600,
SWITCH_IO_MEGABYTES=>20000,
SWITCH_ESTIMATE=>false
);

--Its compulsory to specify directive for OTHER_GROUPS else this will fail
dbms_resource_manager.create_plan_directive(PLAN=> 'CPU_IO_LIMIT',GROUP_OR_SUBPLAN=>'OTHER_GROUPS',CPU_P1=>100);
sys.dbms_resource_manager.validate_pending_area;
sys.dbms_resource_manager.submit_pending_area;
end;
/

--Grant RO_USER to switch group
exec dbms_resource_manager_privs.grant_switch_consumer_group('RO_USER','CPU_IO_LIMIT_GRP',false);
--Set initial group for RO_USER to CPU_IO_LIMIT_GRP
exec dbms_resource_manager.set_initial_consumer_group('RO_USER','CPU_IO_LIMIT_GRP');

To enable resource manager plan

ALTER SYSTEM SET RESOURCE_MANAGER_PLAN ='CPU_IO_LIMIT';

We started query on big table and specified parallel degree as  8

alter session force parallel query parallel 8;

select /*+ full(a)  */ count(*) from big_table a;

From V$px_session we can verify that we requested 8 degree but we are getting only 4 due to resource manager plan directive.

INST_ID Username    QC/Slave   Slave Set  SID    QC SID Requested DOP Actual DOP MODULE                SQL_ID      EVENT             STATUS
---------- ------------ ---------- ---------- ------ ------ ------------- ---------- ------------------------------ ------------- ------------------------- --------
2  - p000    (Slave)    1          390    579        8       4 SQL*Plus                7fgfym909mqmn direct path read        ACTIVE
2    RO_USER    QC                  579    579                 SQL*Plus                7fgfym909mqmn PX Deq: Execute Reply     ACTIVE
2  - p001    (Slave)    1          580    579        8       4 SQL*Plus                7fgfym909mqmn direct path read        ACTIVE
2  - p002    (Slave)    1          772    579        8       4 SQL*Plus                7fgfym909mqmn direct path read        ACTIVE
2  - p003    (Slave)    1          966    579        8       4 SQL*Plus                7fgfym909mqmn direct path read        ACTIVE

I was monitoring the resource usage using v$rsrc_session_info in separate session

SELECT s.sid sess_id, g.name consumer_group,
s.state, s.consumed_cpu_time cpu_time, s.cpu_wait_time, s.queued_time,(s.CURRENT_SMALL_READ_MEGABYTES+s.CURRENT_LARGE_READ_MEGABYTES) read_MB,(s.CURRENT_SMALL_WRITE_MEGABYTES+s.CURRENT_LARGE_WRITE_MEGABYTES) write_mb
FROM v$rsrc_session_info s, v$rsrc_consumer_group g
WHERE s.current_consumer_group_id = g.id and g.name='CPU_IO_LIMIT_GRP';

After some time our original query got cancelled with following error

ERROR at line 1:
ORA-12801: error signaled in parallel query server P001, instance prod02:orcl02 (2)
ORA-56720: I/O data limit exceeded - call aborted

Just before failure , v$rsrc_session_info output looked like this indicating that we reached 20000M limit specified for I/O

SESS_ID CONSUMER_GROUP            STATE     CPU_TIME CPU_WAIT_TIME QUEUED_TIME    READ_MB     WRITE_MB
---------- -------------------------------- ---------- ---------- ------------- ----------- ---------- ----------
1347 CPU_IO_LIMIT_GRP        WAITING           62          0       0         1        0
1154 CPU_IO_LIMIT_GRP        WAITING        38050          0       0     18707        0
774 CPU_IO_LIMIT_GRP        WAITING        37400          0       0     19871        0
582 CPU_IO_LIMIT_GRP        WAITING        36267          0       0     19073        0
10 CPU_IO_LIMIT_GRP        WAITING        41515          0       0     19843        0

You can notice that when we specify parallelism, each process had individual 20000M I/O limit.
Previous query was only read operation and we were wondering if this I/O limit applied to both read/write operation. To confirm this we ran query with 4 table join doing merge cartersian join (This was one of query which used 100G of temp tablespace and prompted us to start this exercise 🙂

To remove the plan, we have to unset resource_manager_plan parameter and then delete it using following code

ALTER SYSTEM SET RESOURCE_MANAGER_PLAN ='';
begin
sys.dbms_resource_manager.clear_pending_area();
sys.dbms_resource_manager.create_pending_area();
sys.dbms_resource_manager.delete_plan_cascade ('CPU_IO_LIMIT');
sys.dbms_resource_manager.submit_pending_area();
end;
/

We recreated plan with 10000M as new I/O limit and ran the query.(We can also use dbms_resource_manager.update_plan_directive to update the settings).

Meanwhile I used new sql to get single /multi block read/write I/O along with totals.

col state for a10
col CONSUMER_GROUP for a20
SELECT s.sid sess_id, g.name consumer_group,
s.state, s.consumed_cpu_time cpu_time, s.cpu_wait_time, s.queued_time,s.CURRENT_SMALL_READ_MEGABYTES S_RIO,s.CURRENT_LARGE_READ_MEGABYTES M_RIO,s.CURRENT_SMALL_WRITE_MEGABYTES S_WIO,s.CURRENT_LARGE_WRITE_MEGABYTES M_WIO,(s.CURRENT_SMALL_READ_MEGABYTES+s.CURRENT_LARGE_READ_MEGABYTES) read_MB,(s.CURRENT_SMALL_WRITE_MEGABYTES+s.CURRENT_LARGE_WRITE_MEGABYTES) write_mb
FROM v$rsrc_session_info s, v$rsrc_consumer_group g
WHERE s.current_consumer_group_id = g.id and g.name='CPU_IO_LIMIT_GRP';

Before failure we had following output from v$rsrc_session_info with 3341M as reads and 6614 writes (total of 9955M ~10000M). It confirms that I/O limit here includes both read/write I/O

SESS_ID CONSUMER_GROUP    STATE         CPU_TIME CPU_WAIT_TIME QUEUED_TIME      S_RIO    M_RIO       S_WIO      M_WIO    READ_MB     WRITE_MB
---------- -------------------- ---------- ---------- ------------- ----------- ---------- ---------- ---------- ---------- ---------- ----------
199 CPU_IO_LIMIT_GRP    WAITING        104786          0          0        233     3108         203       6295      3341         6498

SESS_ID CONSUMER_GROUP    STATE         CPU_TIME CPU_WAIT_TIME QUEUED_TIME      S_RIO    M_RIO       S_WIO      M_WIO    READ_MB     WRITE_MB
---------- -------------------- ---------- ---------- ------------- ----------- ---------- ---------- ---------- ---------- ---------- ----------
199 CPU_IO_LIMIT_GRP    RUNNING        106749          0          0        233     3108         207       6407      3341         6614

One last test was to test for queries exceeding specified CPU Time. To perform this test we altered switch_time to 120 seconds and re-ran the query involving joins.After nearly 5 minutes, our sql got killed with following error

ORA-00040: active time limit exceeded - call aborted
Elapsed: 00:04:47.13

--Resource usage
SESS_ID CONSUMER_GROUP    STATE         CPU_TIME CPU_WAIT_TIME QUEUED_TIME      S_RIO    M_RIO       S_WIO      M_WIO    READ_MB     WRITE_MB
---------- -------------------- ---------- ---------- ------------- ----------- ---------- ---------- ---------- ---------- ---------- ----------
199 CPU_IO_LIMIT_GRP    RUNNING        141184          0          0        236     3108         254       7885      3344         8139

Above query reported 140s as total consumed time even though we had specified 120 s.(Not sure if this is expected behavior as note 1485199.1 mentioned that it does not include any wait) Note that we used switch_group as CANCEL_SQL which cancels the running sql after limit is reached.In case you plan to kill the session, you need to use KILL_SESSION.

You would notice that we used switch_estimate to false . This parameter tells oracle to estimate the execution time before the operation starts.If you use switch_estimate to true  in above code, you will notice that it will kill the session immediately along with error message indicating which I/O limit was reached.

If you set switch_estimate to true and set max_est_exec_time which specifies the maximum execution time (in CPU seconds) allowed for a session. If the optimizer estimates that an operation will take longer than MAX_EST_EXEC_TIME, the operation is not started and ORA-07455 is issued.

You can notice that this is far better approach of enforcing limits rather then writing your own script/triggers. Some of advantages in this case were

a)Both the cases we got pretty informative error indicating that we are excluding specified limits.

b)Gave us control of limiting parallelism

c)Allows us to specify I/O based limits

We didn’t use cpu allocation in this case but you can also limit cpu available for users using mgmt_p1 (cpu_p1 ,etc are deprecated).In case you are planning to implement this in your production database with cancel_sql directive, I would recommend ensuring that it should not be application user else you will start experiencing unwanted results.