Recently I was working on using Oracle DBMS_RESOURCE_MANAGER to limit resources usage by read-only queries. This was required to prevent long running ad-hoc and poorly written queries.Instead of writing custom sql /script to kill long running session, we decided to utilize Oracle Database Resource Manager.
We decided using elapsed_time as criteria for cancelling sql.But during tests we found out that any I/O bound query didn’t kill after specified switch_time.
On checking My Oracle Support, we came across Note ID 1485199.1, which discussed the exact situation faced by us.
As per note , starting 11.2 oracle has fixed the Unpublished Bug 8202097 where some waits were included in the active time and could trigger switch_time early. Due to this fix, switch_time applies for the execution time (in CPU seconds) and not for the total elapsed time (which includes wait times also) of the query.
What this means is that any query which is waiting on wait class other then CPU (e.g I/O ) will not be killed based on switch time.Switch_time applies only to time spent on CPU. Waits are not included.
Checking Oracle 11g Documentation, we found that Oracle had introduced new parameter in create_plan_directive named switch_io_megabytes which can be used to cancel sql after specified I/O limit is reached.
We used following code to create resource manager plan with following directives
a)Limit Cpu time to 3600 s
b)Limit Parallelism to 4
c)Limit I/O to 20000Mb (~20G)
begin sys.dbms_resource_manager.clear_pending_area(); sys.dbms_resource_manager.create_pending_area(); sys.dbms_resource_manager.create_consumer_group(CONSUMER_GROUP=>'CPU_IO_LIMIT_GRP', COMMENT=>'Consumer group for RO_USER to limit parallelism,I/O and CPU Time'); sys.dbms_resource_manager.set_consumer_group_mapping(attribute => DBMS_RESOURCE_MANAGER.ORACLE_USER, value => 'RO_USER',consumer_group =>'CPU_IO_LIMIT_GRP'); -- Create resource plan: sys.dbms_resource_manager.create_plan(PLAN=> 'CPU_IO_LIMIT',COMMENT=>'Cancel Sql plan for RO_USER'); --Need to create plan_directive --Limiting parallelism to max 4, I/O Limit to 10000Mb and CPU_time to 3600 sec sys.dbms_resource_manager.create_plan_directive( PLAN=> 'CPU_IO_LIMIT', GROUP_OR_SUBPLAN=>'CPU_IO_LIMIT_GRP', COMMENT=>'Kill statement after exceeding 3600 sec , limit parallelism to max 4 and limit i/o to 20000M ', PARALLEL_DEGREE_LIMIT_P1 => 4, SWITCH_GROUP=>'CANCEL_SQL', SWITCH_TIME=>3600, SWITCH_IO_MEGABYTES=>20000, SWITCH_ESTIMATE=>false ); --Its compulsory to specify directive for OTHER_GROUPS else this will fail dbms_resource_manager.create_plan_directive(PLAN=> 'CPU_IO_LIMIT',GROUP_OR_SUBPLAN=>'OTHER_GROUPS',CPU_P1=>100); sys.dbms_resource_manager.validate_pending_area; sys.dbms_resource_manager.submit_pending_area; end; / --Grant RO_USER to switch group exec dbms_resource_manager_privs.grant_switch_consumer_group('RO_USER','CPU_IO_LIMIT_GRP',false); --Set initial group for RO_USER to CPU_IO_LIMIT_GRP exec dbms_resource_manager.set_initial_consumer_group('RO_USER','CPU_IO_LIMIT_GRP');
To enable resource manager plan
ALTER SYSTEM SET RESOURCE_MANAGER_PLAN ='CPU_IO_LIMIT';
We started query on big table and specified parallel degree as 8
alter session force parallel query parallel 8; select /*+ full(a) */ count(*) from big_table a;
From V$px_session we can verify that we requested 8 degree but we are getting only 4 due to resource manager plan directive.
INST_ID Username QC/Slave Slave Set SID QC SID Requested DOP Actual DOP MODULE SQL_ID EVENT STATUS ---------- ------------ ---------- ---------- ------ ------ ------------- ---------- ------------------------------ ------------- ------------------------- -------- 2 - p000 (Slave) 1 390 579 8 4 SQL*Plus 7fgfym909mqmn direct path read ACTIVE 2 RO_USER QC 579 579 SQL*Plus 7fgfym909mqmn PX Deq: Execute Reply ACTIVE 2 - p001 (Slave) 1 580 579 8 4 SQL*Plus 7fgfym909mqmn direct path read ACTIVE 2 - p002 (Slave) 1 772 579 8 4 SQL*Plus 7fgfym909mqmn direct path read ACTIVE 2 - p003 (Slave) 1 966 579 8 4 SQL*Plus 7fgfym909mqmn direct path read ACTIVE
I was monitoring the resource usage using v$rsrc_session_info in separate session
SELECT s.sid sess_id, g.name consumer_group, s.state, s.consumed_cpu_time cpu_time, s.cpu_wait_time, s.queued_time,(s.CURRENT_SMALL_READ_MEGABYTES+s.CURRENT_LARGE_READ_MEGABYTES) read_MB,(s.CURRENT_SMALL_WRITE_MEGABYTES+s.CURRENT_LARGE_WRITE_MEGABYTES) write_mb FROM v$rsrc_session_info s, v$rsrc_consumer_group g WHERE s.current_consumer_group_id = g.id and g.name='CPU_IO_LIMIT_GRP';
After some time our original query got cancelled with following error
ERROR at line 1: ORA-12801: error signaled in parallel query server P001, instance prod02:orcl02 (2) ORA-56720: I/O data limit exceeded - call aborted
Just before failure , v$rsrc_session_info output looked like this indicating that we reached 20000M limit specified for I/O
SESS_ID CONSUMER_GROUP STATE CPU_TIME CPU_WAIT_TIME QUEUED_TIME READ_MB WRITE_MB ---------- -------------------------------- ---------- ---------- ------------- ----------- ---------- ---------- 1347 CPU_IO_LIMIT_GRP WAITING 62 0 0 1 0 1154 CPU_IO_LIMIT_GRP WAITING 38050 0 0 18707 0 774 CPU_IO_LIMIT_GRP WAITING 37400 0 0 19871 0 582 CPU_IO_LIMIT_GRP WAITING 36267 0 0 19073 0 10 CPU_IO_LIMIT_GRP WAITING 41515 0 0 19843 0
You can notice that when we specify parallelism, each process had individual 20000M I/O limit.
Previous query was only read operation and we were wondering if this I/O limit applied to both read/write operation. To confirm this we ran query with 4 table join doing merge cartersian join (This was one of query which used 100G of temp tablespace and prompted us to start this exercise 🙂
To remove the plan, we have to unset resource_manager_plan parameter and then delete it using following code
ALTER SYSTEM SET RESOURCE_MANAGER_PLAN =''; begin sys.dbms_resource_manager.clear_pending_area(); sys.dbms_resource_manager.create_pending_area(); sys.dbms_resource_manager.delete_plan_cascade ('CPU_IO_LIMIT'); sys.dbms_resource_manager.submit_pending_area(); end; /
We recreated plan with 10000M as new I/O limit and ran the query.(We can also use dbms_resource_manager.update_plan_directive to update the settings).
Meanwhile I used new sql to get single /multi block read/write I/O along with totals.
col state for a10 col CONSUMER_GROUP for a20 SELECT s.sid sess_id, g.name consumer_group, s.state, s.consumed_cpu_time cpu_time, s.cpu_wait_time, s.queued_time,s.CURRENT_SMALL_READ_MEGABYTES S_RIO,s.CURRENT_LARGE_READ_MEGABYTES M_RIO,s.CURRENT_SMALL_WRITE_MEGABYTES S_WIO,s.CURRENT_LARGE_WRITE_MEGABYTES M_WIO,(s.CURRENT_SMALL_READ_MEGABYTES+s.CURRENT_LARGE_READ_MEGABYTES) read_MB,(s.CURRENT_SMALL_WRITE_MEGABYTES+s.CURRENT_LARGE_WRITE_MEGABYTES) write_mb FROM v$rsrc_session_info s, v$rsrc_consumer_group g WHERE s.current_consumer_group_id = g.id and g.name='CPU_IO_LIMIT_GRP';
Before failure we had following output from v$rsrc_session_info with 3341M as reads and 6614 writes (total of 9955M ~10000M). It confirms that I/O limit here includes both read/write I/O
SESS_ID CONSUMER_GROUP STATE CPU_TIME CPU_WAIT_TIME QUEUED_TIME S_RIO M_RIO S_WIO M_WIO READ_MB WRITE_MB ---------- -------------------- ---------- ---------- ------------- ----------- ---------- ---------- ---------- ---------- ---------- ---------- 199 CPU_IO_LIMIT_GRP WAITING 104786 0 0 233 3108 203 6295 3341 6498 SESS_ID CONSUMER_GROUP STATE CPU_TIME CPU_WAIT_TIME QUEUED_TIME S_RIO M_RIO S_WIO M_WIO READ_MB WRITE_MB ---------- -------------------- ---------- ---------- ------------- ----------- ---------- ---------- ---------- ---------- ---------- ---------- 199 CPU_IO_LIMIT_GRP RUNNING 106749 0 0 233 3108 207 6407 3341 6614
One last test was to test for queries exceeding specified CPU Time. To perform this test we altered switch_time to 120 seconds and re-ran the query involving joins.After nearly 5 minutes, our sql got killed with following error
ORA-00040: active time limit exceeded - call aborted Elapsed: 00:04:47.13 --Resource usage SESS_ID CONSUMER_GROUP STATE CPU_TIME CPU_WAIT_TIME QUEUED_TIME S_RIO M_RIO S_WIO M_WIO READ_MB WRITE_MB ---------- -------------------- ---------- ---------- ------------- ----------- ---------- ---------- ---------- ---------- ---------- ---------- 199 CPU_IO_LIMIT_GRP RUNNING 141184 0 0 236 3108 254 7885 3344 8139
Above query reported 140s as total consumed time even though we had specified 120 s.(Not sure if this is expected behavior as note 1485199.1 mentioned that it does not include any wait) Note that we used switch_group as CANCEL_SQL which cancels the running sql after limit is reached.In case you plan to kill the session, you need to use KILL_SESSION.
You would notice that we used switch_estimate to false . This parameter tells oracle to estimate the execution time before the operation starts.If you use switch_estimate to true in above code, you will notice that it will kill the session immediately along with error message indicating which I/O limit was reached.
If you set switch_estimate to true and set max_est_exec_time which specifies the maximum execution time (in CPU seconds) allowed for a session. If the optimizer estimates that an operation will take longer than MAX_EST_EXEC_TIME, the operation is not started and ORA-07455 is issued.
You can notice that this is far better approach of enforcing limits rather then writing your own script/triggers. Some of advantages in this case were
a)Both the cases we got pretty informative error indicating that we are excluding specified limits.
b)Gave us control of limiting parallelism
c)Allows us to specify I/O based limits
We didn’t use cpu allocation in this case but you can also limit cpu available for users using mgmt_p1 (cpu_p1 ,etc are deprecated).In case you are planning to implement this in your production database with cancel_sql directive, I would recommend ensuring that it should not be application user else you will start experiencing unwanted results.
Recent Comments