10g

10g Articles

PROCESSED Messages not clearing from Oracle Queue

I was contacted by Dev team to look into Development database where Oracle queue size kept  on increasing. As per them messages were not getting cleared.They were running count on Queue table and the count kept increasing.
To find out exact cause I included the msg_state in the query and found out that there were lot of unprocessed messages

SQL> select queue,msg_state,count(*) from A$$JMS_QUEUE_TABLE group by queue,msg_state;

QUEUE                   MSG_STATE      COUNT(*)
------------------------------ ---------------- ----------
AQ$_JMS_QUEUE_TABLE_E      EXPIRED             3
JMS_QUEUE               PROCESSED           343
JMS_QUEUE               READY             3

Since this was dev box,I initially tried purging the queue using dbms_aqadm.purge_queue_table

DECLARE
po_t dbms_aqadm.aq$_purge_options_t;
BEGIN
dbms_aqadm.purge_queue_table('SCOTT.JMS_QUEUE_TABLE',
purge_condition => null,
purge_options => po_t);
END;

This removed the EXPIRED and READY messages but PROCESSED messages count didn’t decrease.
Next  I checked queue definition using all_queues.I saw a value named retention specified to 3600.

SQL> SELECT owner, name, retention FROM all_queues WHERE name LIKE '%JMS%';

OWNER                   NAME                  RETENTION
-------            ------------------        --------------
SCOTT               JMS_QUEUE              3600
SCOTT               AQ$_JMS_QUEUE_TABLE_E      0

When a queue is defined in Oracle, you can define how long a message can remain visible once it has been dequeued (i.e. the retention period). Modifying the retention to 0 cleared the messages instantly

exec dbms_aqadm.alter_queue(QUEUE_NAME=>'SCOTT.JMS_QUEUE',RETENTION_TIME=>0)

Some interesting learning for me 🙂

ORA-01873 error running SAP pre-upgrade scripts

While trying to run pre_upgrade_status.sql script for DB upgrade (10.2.0.4 to 11.2.0.4), we got following error

@pre_upgrade_status.sql
ERROR at line 1:
ORA-01873: the leading precision of the interval is too small
ORA-06512: at line 8

On debugging further , found out that it was due to limit imposed on “INTERVAL DAY TO SECOND” datatype. By default precision is 2 i.e. interval day to second is equivalent to interval day(2) to second(2). As per Oracle this is not bug and you should use internal datatype DSINTERVAL_UNCONSTRAINED

I couldn’t get any hit on web for this as this error will only be encountered if the AWR retention is set to 100 or more days. This is easily reproducible

--Set retention to >100 days
execute dbms_workload_repository.modify_snapshot_settings (interval => 60,retention =>145440)

--execute following code

declare
sap_awr_small      BOOLEAN  := FALSE;
sap_awr_retention_days NUMBER;
sap_awr_retention  INTERVAL DAY TO SECOND;
 BEGIN
       sap_awr_small := FALSE;

       SELECT RETENTION INTO sap_awr_retention from DBA_HIST_WR_CONTROL WHERE  dbid = (select dbid from v$database);
       -- dbms_output.put_line(sap_awr_retention);
       sap_awr_retention_days := extract (day from sap_awr_retention);
       -- dbms_output.put_line(sap_awr_retention_days);
       if (sap_awr_retention_days < 42) THEN
         sap_awr_small := TRUE;
       END IF;
end;
/

ERROR at line 1:
ORA-01873: the leading precision of the interval is too small
ORA-06512: at line 8

Easy workaround is to set retention to lower value say 99 days and revert it post upgrade

execute dbms_workload_repository.modify_snapshot_settings (interval => 60,retention =>142560)

In case you are facing this error in your application code, then you can Refer MyOracle Support article ORA-1873 : When Passing Datetime Interval with Precision > 2 in a Parameter (Doc ID 394260.1) for modifying your code to use custom type.

LGWR terminating instance due to error 338

Recently we came across a issue where our DB crashed with ORA-00338 error .

Errors in file /oracle/diag/rdbms/orcl11g/orc11g/trace/orc11g_lgwr_24118.trc:
ORA-00338: log 2 of thread 1 is more recent than control file
ORA-00312: online log 2 thread 1: '/oracle/oradata/orcl11g/redo02.log'
LGWR (ospid: 24118): terminating the instance due to error 338

DB couldn’t be restarted as it gave same errors while opening. For multiplexed redo log files, it reports error for both log files.

Error Description: (Reference :http://psoug.org/oraerror/ORA-00338.htm)
Log string of thread string is more recent than control file

Error Cause:

The control file change sequence number in the log file is greater than the number in the control file. This implies that the wrong control file is being used. Note that repeatedly causing this error can make it stop happening without correcting the real problem. Every attempt to open the database will advance the control file change sequence number until it is great enough.

Action:
Use the current control file or do backup control file recovery to make the control file current. Be sure to follow all restrictions on doing a backup control file recovery.

Above explanation suggests that there is problem with controlfile. Normally these errors are seen when doing incomplete recovery. To troubleshoot it ,we took dumps of redo log files (On recommendation of Oracle support)

Note: Below logs are from test system which was used to reproduce the issue

SQL> alter system dump logfile '/oracle/oradata/orcl11g/redo01.log' validate;
System altered.
SQL> alter system dump logfile '/oracle/oradata/orcl11g/redo02.log' validate;
ERROR at line 1:
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '/oracle/oradata/orcl11g/redo02.log'
SQL> alter system dump logfile '/oracle/oradata/orcl11g/redo03.log' validate;
*
ERROR at line 1:
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '/oracle/oradata/orcl11g/redo03.log'

redo02.log and redo03.log dump failed with errors that it does not contain redo. Since redo01.log dump was successful, we looked at trace.

DUMP OF REDO FROM FILE '/oracle/oradata/orcl11g/redo01.log'
 Opcodes *.*
 RBAs: 0x000000.00000000.0000 thru 0xffffffff.ffffffff.ffff
 SCNs: scn: 0x0000.00000000 thru scn: 0xffff.ffffffff
 Times: creation thru eternity
 VALIDATE ONLY
 FILE HEADER:
 Compatibility Vsn = 186647552=0xb200400
 Db ID=970369526=0x39d6a9f6, Db Name='TESTDB'
 Activation ID=2650290266=0x9df8385a
 Control Seq=5124=0x1404, File size=102400=0x19000
 File Number=1, Blksiz=512, File Type=2 LOG
 descrip:"Thread 0001, Seq# 0000000001, SCN 0x00000016f528-0xffffffffffff"
 thread: 1 nab: 0xffffffff seq: 0x00000001 hws: 0x3 eot: 1 dis: 0
 resetlogs count: 0x33acc28a scn: 0x0000.0016f528 (1504552)
 prev resetlogs count: 0x3377bd37 scn: 0x0000.000e2006 (925702)
 Low scn: 0x0000.0016f528 (1504552) 12/22/2014 06:13:30
 Next scn: 0xffff.ffffffff 01/01/1988 00:00:00
 Enabled scn: 0x0000.0016f528 (1504552) 12/22/2014 06:13:30
 Thread closed scn: 0x0000.0016f528 (1504552) 12/22/2014 06:13:30
 Disk cksum: 0xcec6 Calc cksum: 0xcec6
 Terminal recovery stop scn: 0x0000.00000000
 Terminal recovery 01/01/1988 00:00:00
 Most recent redo scn: 0x0000.00000000
 Largest LWN: 0 blocks
 End-of-redo stream : No
 Unprotected mode
 Miscellaneous flags: 0x800000
 Thread internal enable indicator: thr: 0, seq: 0 scn: 0x0000.00000000
 Zero blocks: 0
 Format ID is 2
 redo log key is 1679de3ad36cdd2684143daaa1635b8
 redo log key flag is 5
 Enabled redo threads: 1
END OF REDO DUMP
----- Redo read statistics for thread 1 -----

If you look at dumpfile it says that sequence is set to 1 and DB name=”TESTDB’ .Our instance name is orc11g and last sequence# was more than 1 (can also be confirmed from v$log). This indicated that our redo logs were overwritten by some other process. DB name in redo log hinted that it was done by reporting clone refresh process. Problem was that redo log volume was cross mounted on a reporting clone which overwrote the redo logs during the refresh process .

As a fix, new volume was provisioned for the clone redo logs and volume export was revoked. Since current redo log also got overwritten, we had to restore last hot backup and perform incomplete recovery till the last ETL start time (This was a datawarehouse db). In OLTP this would have caused data loss.

This issue is easily reproducible. If you try to clone a instance by using same name for redo log and open the database, you will receive the error in your source database. DB files are protected by DBWR so we are protected from risk of other Database opening datafiles and get ORA-01157

ORA-01157: cannot identify/lock data file 3 - see DBWR trace file
ORA-01110: data file 3: '/oracle/oradata/orcl11g/undotbs01.dbf'

portmap: unrecognized service on RHEL6

Quick note for people using NFS for shared storage on RAC database. Till RHEL5 we had to ensure nfs,nfslock and portmap service has to be running.
These services are required otherwise you will get following errors while mounting database

ORA-00210: cannot open the specified control file
ORA-00202: control file: '/u01/oradata/orcl/control01.ctl'
ORA-27086: unable to lock file - already in use

Mostly this could be auto-enabled on boot by using chkconfig command. While working on similar issue today, I found out that this service is not present in RHEL 6

# service portmap status
portmap: unrecognized service

The portmap service was used to map RPC program numbers to IP address port number combinations in earlier versions of Red Hat Enterprise Linux.
As per RHEL6 docs, portmap service has been replaced by rpcbind in Red Hat Enterprise Linux 6 to enable IPv6 support.
So following command will work

# service rpcbind status
rpcbind (pid  1587) is running...

You can read about NFS and associated processes from RHEL6 docs

ORA-12514 due to DNS issue

A User reported ORA-12514 error from JDBC application connecting to RAC database

ORA-12514: TNS:listener does not currently know of service requested in connect descriptor

To troubleshoot it,

a)I checked following things to ensure specified service is running

srvctl status service -d dbname -s serv_name

b) Ensured service is registered with listener

lsnrctl status

Note that if service has domain name like *.us.oracle.com then TNS entry should also have same service name.

Service "orcl.us.oracle.com" has 1 instance(s).
Instance "orcl1", status READY, has 1 handler(s) for this service...

Please note that tnsping only helps to check what tns entry is being used (in case multiple entries are present ) and whether the listener is up. It doesn’t check if database being connected to is up/down.

Tip: In case tnsping utility hangs you can try doing telnet hostname port. If this hangs too, then it could be firewall issue

Coming back to issue. I decided to test if I can connect to database using same TNS entry. Tnsnames.ora has issues if there are extra spaces. I added tns entry in same db host and tested it and I was able to login.
Since user was still not able to connect I decided to use some other machine and check if I can connect to db ( I didnt had access on application host and we had already checked that it was not firewall issue)

Trying from different host gave me dreaded error “ORA-12514” 🙂
I did quick check of VIP address using ping/nslookup (nslookup node1-vip) utility and found that node1-vip ip was same as node2-vip in DNS. Service was running on node1-vip. Modifying the DNS resolved the connection issue.
In case you are wondering why I didn’t get this error earlier.This is because name resolution was being done by /etc/hosts on database host which had correct entries for node1-vip.

Sqlplus HTML reports

I was working on setting up monitoring for application team to monitor AQ and wanted html formatting of table data. On searching I found this link from Ittichai Chammavanijakul  which makes use of css formatting to generate good visual reports.

 

EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
7369 SMITH CLERK 7902 17-DEC-1980 00:00:00 800 20
7499 ALLEN SALESMAN 7698 20-FEB-1981 00:00:00 1600 300 30
7521 WARD SALESMAN 7698 22-FEB-1981 00:00:00 1250 500 30
7566 JONES MANAGER 7839 02-APR-1981 00:00:00 2975 20
7654 MARTIN SALESMAN 7698 28-SEP-1981 00:00:00 1250 1400 30

I used following code

set markup HTML ON HEAD "<style type='text/css'> -
body { -
font:10pt Arial,Helvetica,sans-serif; -
color:blue; background:white; } -
p { -
font:8pt Arial,sans-serif; -
color:grey; background:white; } -
table,tr,td { -
font:10pt Arial,Helvetica,sans-serif; -
text-align:right; -
color:Black; background:white; -
padding:0px 0px 0px 0px; margin:0px 0px 0px 0px; } -
th { -
font:bold 10pt Arial,Helvetica,sans-serif; -
color:#336699; -
background:#cccc99; -
padding:0px 0px 0px 0px;} -
h1 { -
font:16pt Arial,Helvetica,Geneva,sans-serif; -
color:#336699; -
background-color:White; -
border-bottom:1px solid #cccc99; -
margin-top:0pt; margin-bottom:0pt; padding:0px 0px 0px 0px;} -
h2 { -
font:bold 10pt Arial,Helvetica,Geneva,sans-serif; -
color:#336699; -
background-color:White; -
margin-top:4pt; margin-bottom:0pt;} -
a { -
font:9pt Arial,Helvetica,sans-serif; -
color:#663300; -
background:#ffffff; -
margin-top:0pt; margin-bottom:0pt; vertical-align:top;} -
.threshold-critical { -
font:bold 10pt Arial,Helvetica,sans-serif; -
color:red; } -
.threshold-warning { -
font:bold 10pt Arial,Helvetica,sans-serif; -
color:orange; } -
.threshold-ok { -
font:bold 10pt Arial,Helvetica,sans-serif; -
color:green; } -
</style> -
<title>SQL*Plus Report</title>" -
BODY "" - 
TABLE "border='1' width='90%' align='center'" -
ENTMAP OFF SPOOL ON

So you can execute above code in sqlplus and spool the output to file. This file can be send using uuencode/sendmail function . You can use below code (note content-type is text/html)

_send_email()
{
/usr/lib/sendmail -t << EOF
From: $FROMADDRESS
To: $EMAIL
Subject: $MESSAGE
X-Priority: 1
Content-type: text/html

`cat $LOGFILE |grep -v 'rows selected'`
EOF
}

Call the function _send_email in your script and ensure all variables are correctly set.

uuencode usage is also simple

uuencode emp.html emp.html|mailx [email protected]