11.2 AQ – Messages stuck in PROCESESED,WAIT State

I had been working today on AQ issue where messages were not moving from WAIT to READY State.This was on 11.2.0.3 database running on RHEL5u6(64 bit). I thought of quickly documenting it for easy reference
When you enqueue record, you can specify delay after which record should be ready for dequeue.
During this time, MSG_STATE is in 'WAIT' state, which changes to 'READY' state after current time reaches delay timestamp.All the stuck records were having delay_timestamp past current time.This can be verified by querying AQ$Queue_table.

Below example , you see that MSG_ID 'D2DA56B9746E2E59E043096F93452D8A' has delay_timestamp of '10-JAN-13 05.42.07.835125 AM' which is less then current time ( 9:52 AM)

 09:52:00 SQL > select msg_id,msg_state,enq_timestamp,delay_timestamp from AQ$TEST_QTABLE where rownum <5;

 MSG_ID				 MSG_STATE  ENQ_TIMESTAMP		DELAY_TIMESTAMP
-------------------------------- ---------- ------------------------------ ------------------------------
D2DA56B94EF02E59E043096F93452D8A WAIT	    09-JAN-13 07.37.07.595334 AM   23-FEB-13 07.37.07.595334 AM
D2DA56B9746E2E59E043096F93452D8A WAIT	    09-JAN-13 07.42.07.835125 AM   10-JAN-13 05.42.07.835125 AM
D2DA56B94ED32E59E043096F93452D8A WAIT	    09-JAN-13 07.37.05.274447 AM   23-FEB-13 07.37.05.274447 AM
D2F1EDF5112D59A1E043096F93451A42 READY	    10-JAN-13 07.35.33.089265 AM   10-JAN-13 07.35.34.089265 AM

Along with this there were lot of messages in PROCESSED state.My first thought was that Retention for this queue was set to higher value, but on checking I found that it was set to 0. You can read my previous article on Retention and Processed messages in queue

V$AQ revealed that there were 2 more queues with Pending processed messages. As per documentation, Ready column should only show messages in READY state but here it included count for PROCESSED state too (Not sure if this is also bug)

select * from v$AQ where ready >1000
/
      QID    WAITING	   READY    EXPIRED AVERAGE_MSG_AGE TOTAL_WAIT AVERAGE_WAIT
--------- ---------- ---------- ---------- --------------- ---------- ------------
    79884	 1757	 6153814	  0	      92459 5.6914E+11	 92459.9842
    79892	    0	  183916	  0	   16643589 3.0610E+12	 16643589.1
    79894	    0	    1122	  0	    5420368 6081652896	    5420368

Verified that aq_tm_processes parameter was not set (Oracle recommends to unset it in 10g and above so that oracle can auto-tune it) and there were two processes running

$ps -ef|grep ora_q
oracle   25235     1  0 00:53 ?        00:00:00 ora_qmnc_alpha
oracle   25334     1  1 00:53 ?        00:04:26 ora_q001_alpha
oracle   26270     1  0 00:56 ?        00:00:00 ora_q002_alpha

To check further, I checked alert log for any ORA errors and found few entries for ORA-01405.

ORA-01405: fetched column value is NULL

On checking trace directory for queue process tracefile, same ora-01405 error was logged

kwqdlScanDeqLog during stmt fetch: retval -1, errnum 1405, errbuf ORA-01405: fetched column value is NULL

Checked Oracle MySupport and was able to locate note ID 1162862.1 - ORA-01405: Fetched Column Value Is Null For An Advanced Queue

In Oracle 11.2 a new dequeue log table named as AQ$_QT_L has been introduced which is used for storing message identifiers of committed dequeued operations on the queue.Due to bug 12899768 there are few message id with '00000000000000000000000000000000' in this queue table which causes PROCESSED messages to get stuck.

Note 12899768.8 indicates that this is bug in 11.2.0.2 and 11.2.0.3 database

Symptoms include

- ORA-01405 errors for qmon process
- Lot of PROCESSED messages in queue
- Messages with message id's like '00000000000000000000000000000000'

We need to either apply patch for this bug or set event '10852 trace name context forever, level 16384'
On searching for this event, came across Note 1365655.1 - Dequeue by msgid of a message in WAIT state does not work properly in 11.2

This note matched our main problem where messages are not changing state from WAIT to ready. This note also suggested setting event 10852 followed by database restart. To correct this issue we followed below steps

a)Stopped the Apps to prevent enqueue/dequeue. You can also stop it using dbms_aqadm.stop_queue() but this would generate errors in app.
b)deleted messages with msgid like '00000000000000000000000000000000' from new queue table

delete from AQ$TEST_TABLE_L where msgid ='00000000000000000000000000000000';
commit;

c)Set event in spfile and restart database

conn / as sysdba
alter system set event='10852 trace name context forever, level 16384' scope=spfile;

After restart,messages with delay_timestamp older then current time changed state to READY and count in V$AQ started dropping indicating that qmon process was purging processed messages.

Leave a Reply