oracle

PROCESSED Messages not clearing from Oracle Queue

I was contacted by Dev team to look into Development database where Oracle queue size kept  on increasing. As per them messages were not getting cleared.They were running count on Queue table and the count kept increasing.
To find out exact cause I included the msg_state in the query and found out that there were lot of unprocessed messages

SQL> select queue,msg_state,count(*) from A$$JMS_QUEUE_TABLE group by queue,msg_state;

QUEUE                   MSG_STATE      COUNT(*)
------------------------------ ---------------- ----------
AQ$_JMS_QUEUE_TABLE_E      EXPIRED             3
JMS_QUEUE               PROCESSED           343
JMS_QUEUE               READY             3

Since this was dev box,I initially tried purging the queue using dbms_aqadm.purge_queue_table

DECLARE
po_t dbms_aqadm.aq$_purge_options_t;
BEGIN
dbms_aqadm.purge_queue_table('SCOTT.JMS_QUEUE_TABLE',
purge_condition => null,
purge_options => po_t);
END;

This removed the EXPIRED and READY messages but PROCESSED messages count didn’t decrease.
Next  I checked queue definition using all_queues.I saw a value named retention specified to 3600.

SQL> SELECT owner, name, retention FROM all_queues WHERE name LIKE '%JMS%';

OWNER                   NAME                  RETENTION
-------            ------------------        --------------
SCOTT               JMS_QUEUE              3600
SCOTT               AQ$_JMS_QUEUE_TABLE_E      0

When a queue is defined in Oracle, you can define how long a message can remain visible once it has been dequeued (i.e. the retention period). Modifying the retention to 0 cleared the messages instantly

exec dbms_aqadm.alter_queue(QUEUE_NAME=>'SCOTT.JMS_QUEUE',RETENTION_TIME=>0)

Some interesting learning for me 🙂

ORA-01873 error running SAP pre-upgrade scripts

While trying to run pre_upgrade_status.sql script for DB upgrade (10.2.0.4 to 11.2.0.4), we got following error

@pre_upgrade_status.sql
ERROR at line 1:
ORA-01873: the leading precision of the interval is too small
ORA-06512: at line 8

On debugging further , found out that it was due to limit imposed on “INTERVAL DAY TO SECOND” datatype. By default precision is 2 i.e. interval day to second is equivalent to interval day(2) to second(2). As per Oracle this is not bug and you should use internal datatype DSINTERVAL_UNCONSTRAINED

I couldn’t get any hit on web for this as this error will only be encountered if the AWR retention is set to 100 or more days. This is easily reproducible

--Set retention to >100 days
execute dbms_workload_repository.modify_snapshot_settings (interval => 60,retention =>145440)

--execute following code

declare
sap_awr_small      BOOLEAN  := FALSE;
sap_awr_retention_days NUMBER;
sap_awr_retention  INTERVAL DAY TO SECOND;
 BEGIN
       sap_awr_small := FALSE;

       SELECT RETENTION INTO sap_awr_retention from DBA_HIST_WR_CONTROL WHERE  dbid = (select dbid from v$database);
       -- dbms_output.put_line(sap_awr_retention);
       sap_awr_retention_days := extract (day from sap_awr_retention);
       -- dbms_output.put_line(sap_awr_retention_days);
       if (sap_awr_retention_days < 42) THEN
         sap_awr_small := TRUE;
       END IF;
end;
/

ERROR at line 1:
ORA-01873: the leading precision of the interval is too small
ORA-06512: at line 8

Easy workaround is to set retention to lower value say 99 days and revert it post upgrade

execute dbms_workload_repository.modify_snapshot_settings (interval => 60,retention =>142560)

In case you are facing this error in your application code, then you can Refer MyOracle Support article ORA-1873 : When Passing Datetime Interval with Precision > 2 in a Parameter (Doc ID 394260.1) for modifying your code to use custom type.

LGWR terminating instance due to error 338

Recently we came across a issue where our DB crashed with ORA-00338 error .

Errors in file /oracle/diag/rdbms/orcl11g/orc11g/trace/orc11g_lgwr_24118.trc:
ORA-00338: log 2 of thread 1 is more recent than control file
ORA-00312: online log 2 thread 1: '/oracle/oradata/orcl11g/redo02.log'
LGWR (ospid: 24118): terminating the instance due to error 338

DB couldn’t be restarted as it gave same errors while opening. For multiplexed redo log files, it reports error for both log files.

Error Description: (Reference :http://psoug.org/oraerror/ORA-00338.htm)
Log string of thread string is more recent than control file

Error Cause:

The control file change sequence number in the log file is greater than the number in the control file. This implies that the wrong control file is being used. Note that repeatedly causing this error can make it stop happening without correcting the real problem. Every attempt to open the database will advance the control file change sequence number until it is great enough.

Action:
Use the current control file or do backup control file recovery to make the control file current. Be sure to follow all restrictions on doing a backup control file recovery.

Above explanation suggests that there is problem with controlfile. Normally these errors are seen when doing incomplete recovery. To troubleshoot it ,we took dumps of redo log files (On recommendation of Oracle support)

Note: Below logs are from test system which was used to reproduce the issue

SQL> alter system dump logfile '/oracle/oradata/orcl11g/redo01.log' validate;
System altered.
SQL> alter system dump logfile '/oracle/oradata/orcl11g/redo02.log' validate;
ERROR at line 1:
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '/oracle/oradata/orcl11g/redo02.log'
SQL> alter system dump logfile '/oracle/oradata/orcl11g/redo03.log' validate;
*
ERROR at line 1:
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '/oracle/oradata/orcl11g/redo03.log'

redo02.log and redo03.log dump failed with errors that it does not contain redo. Since redo01.log dump was successful, we looked at trace.

DUMP OF REDO FROM FILE '/oracle/oradata/orcl11g/redo01.log'
 Opcodes *.*
 RBAs: 0x000000.00000000.0000 thru 0xffffffff.ffffffff.ffff
 SCNs: scn: 0x0000.00000000 thru scn: 0xffff.ffffffff
 Times: creation thru eternity
 VALIDATE ONLY
 FILE HEADER:
 Compatibility Vsn = 186647552=0xb200400
 Db ID=970369526=0x39d6a9f6, Db Name='TESTDB'
 Activation ID=2650290266=0x9df8385a
 Control Seq=5124=0x1404, File size=102400=0x19000
 File Number=1, Blksiz=512, File Type=2 LOG
 descrip:"Thread 0001, Seq# 0000000001, SCN 0x00000016f528-0xffffffffffff"
 thread: 1 nab: 0xffffffff seq: 0x00000001 hws: 0x3 eot: 1 dis: 0
 resetlogs count: 0x33acc28a scn: 0x0000.0016f528 (1504552)
 prev resetlogs count: 0x3377bd37 scn: 0x0000.000e2006 (925702)
 Low scn: 0x0000.0016f528 (1504552) 12/22/2014 06:13:30
 Next scn: 0xffff.ffffffff 01/01/1988 00:00:00
 Enabled scn: 0x0000.0016f528 (1504552) 12/22/2014 06:13:30
 Thread closed scn: 0x0000.0016f528 (1504552) 12/22/2014 06:13:30
 Disk cksum: 0xcec6 Calc cksum: 0xcec6
 Terminal recovery stop scn: 0x0000.00000000
 Terminal recovery 01/01/1988 00:00:00
 Most recent redo scn: 0x0000.00000000
 Largest LWN: 0 blocks
 End-of-redo stream : No
 Unprotected mode
 Miscellaneous flags: 0x800000
 Thread internal enable indicator: thr: 0, seq: 0 scn: 0x0000.00000000
 Zero blocks: 0
 Format ID is 2
 redo log key is 1679de3ad36cdd2684143daaa1635b8
 redo log key flag is 5
 Enabled redo threads: 1
END OF REDO DUMP
----- Redo read statistics for thread 1 -----

If you look at dumpfile it says that sequence is set to 1 and DB name=”TESTDB’ .Our instance name is orc11g and last sequence# was more than 1 (can also be confirmed from v$log). This indicated that our redo logs were overwritten by some other process. DB name in redo log hinted that it was done by reporting clone refresh process. Problem was that redo log volume was cross mounted on a reporting clone which overwrote the redo logs during the refresh process .

As a fix, new volume was provisioned for the clone redo logs and volume export was revoked. Since current redo log also got overwritten, we had to restore last hot backup and perform incomplete recovery till the last ETL start time (This was a datawarehouse db). In OLTP this would have caused data loss.

This issue is easily reproducible. If you try to clone a instance by using same name for redo log and open the database, you will receive the error in your source database. DB files are protected by DBWR so we are protected from risk of other Database opening datafiles and get ORA-01157

ORA-01157: cannot identify/lock data file 3 - see DBWR trace file
ORA-01110: data file 3: '/oracle/oradata/orcl11g/undotbs01.dbf'

ORA-12514 due to DNS issue

A User reported ORA-12514 error from JDBC application connecting to RAC database

ORA-12514: TNS:listener does not currently know of service requested in connect descriptor

To troubleshoot it,

a)I checked following things to ensure specified service is running

srvctl status service -d dbname -s serv_name

b) Ensured service is registered with listener

lsnrctl status

Note that if service has domain name like *.us.oracle.com then TNS entry should also have same service name.

Service "orcl.us.oracle.com" has 1 instance(s).
Instance "orcl1", status READY, has 1 handler(s) for this service...

Please note that tnsping only helps to check what tns entry is being used (in case multiple entries are present ) and whether the listener is up. It doesn’t check if database being connected to is up/down.

Tip: In case tnsping utility hangs you can try doing telnet hostname port. If this hangs too, then it could be firewall issue

Coming back to issue. I decided to test if I can connect to database using same TNS entry. Tnsnames.ora has issues if there are extra spaces. I added tns entry in same db host and tested it and I was able to login.
Since user was still not able to connect I decided to use some other machine and check if I can connect to db ( I didnt had access on application host and we had already checked that it was not firewall issue)

Trying from different host gave me dreaded error “ORA-12514” 🙂
I did quick check of VIP address using ping/nslookup (nslookup node1-vip) utility and found that node1-vip ip was same as node2-vip in DNS. Service was running on node1-vip. Modifying the DNS resolved the connection issue.
In case you are wondering why I didn’t get this error earlier.This is because name resolution was being done by /etc/hosts on database host which had correct entries for node1-vip.

GoldenGate Setup for Real-Time Data Synchronization

In this post we discussed about GoldenGate(GG) concepts and setting GG without DataPump Process.
Now I will talk about DataPump process and show how to setup GG replication which also uses DataPump.

In GG environment, extract process will write the data to local trail files and then pump process reads from this local trail file and sends the data, over the network, to destination trail file where this will be read by replicat process to write the changes to destination database.
The advantage of using datapump process is to improve the performance of overall replication as it will free-up the extract process from sending the data to destination side.

To setup this, download and extract the GG software as discussed in earlier post(Step 1-7).
Once the GG software is extracted, go the directory where it is extracted, know as GG home directory.
In this case it will be /u01/app/GGS.

1. Configure and start Manager process:

GGSCI (gg1.company.com) 1> edit params mgr

It will open the parameter file for manager. Enter the following in the file:

PORT 7809

save and quit the file.

2. Start the Manager process:

GGSCI (gg1.company.com) 2> start mgr

Manager started.

3.

GGSCI (gg1.company.com) 3> info all

Program     Status      Group       Lag           Time Since Chkpt   MANAGER    RUNNING
MANAGER    RUNNING

4. Now add and configure extract process:

The syntax to be used is:

ADD EXTRACT <extract name>, TRANLOG, BEGIN NOW
GGSCI (gg1.company.com) 2>add extract ext_gg1, tranlog, begin now
Extract Added

GGSCI (gg1.company.com) 1> info *

EXTRACT    EXT_GG1   Initialized   2012-06-10 16:15   Status STOPPED
Checkpoint Lag       00:00:00 (updated 00:01:45 ago)
Log Read Checkpoint  Oracle Redo Logs
2012-06-10 16:15:20  Seqno 0, RBA 0

5. Configure the trail file to which extract will write the records:

Syntax:

ADD EXTTRAIL <extract trail path/two character trail id> ,EXTRACT <extract name>, Megabytes <n>
GGSCI (gg1.company.com) 2> add exttrail /u01/app/GGS/dirdat/lt, extract ext_gg1, megabytes 50
EXTTRAIL added.

6. Create extract parameter file:

GGSCI (gg1.company.com) 1> edit params ext_gg1

extract ext_gg1
userid gggate, password oracle
exttrail /u01/app/GGS/dirdat/lt

7. Configure Pump Process which will read the local extract trail file and send data to remote trail:

Syntax:
ADD EXTRACT <pump name>, EXTTRAILSOURCE <extract trail Path/two character trail id>

GGSCI (gg1.company.com) 1> add extract pump_gg1, exttrailsource /u01/app/GGS/dirdat/lt
EXTRACT added.
GGSCI (gg1.company.com) 2> info *

EXTRACT    EXT_GG1   Initialized   2012-06-10 16:15   Status STOPPED
Checkpoint Lag       00:00:00 (updated 00:16:56 ago)
Log Read Checkpoint  Oracle Redo Logs
2012-06-10 16:15:20  Seqno 0, RBA 0

EXTRACT    PUMP_GG1  Initialized   2012-06-10 16:32   Status STOPPED
Checkpoint Lag       00:00:00 (updated 00:00:04 ago)
Log Read Checkpoint  File /u01/app/GGS/dirdat/lt000000
First Record  RBA 0

8. Configure a remote trail file name with Pump process:

Syntax:
ADD RMTTRAIL <pump trail path/two character trail id> ,EXTRACT <pump name>, Megabytes <n>

GGSCI (gg1.company.com) 4> add rmttrail /u01/app/GGS/dirdat/rt, extract pump_gg1
RMTTRAIL added.

9. Add remote host information to Pump process:

— Remote host and remort manager port to write trail

RMTHOST <Remote hostname>, MGRPORT <Target manager port number>

— Remote trail info

RMTTRAIL <extract trail path/two character trail id>

GGSCI (gg1.company.com) 1> edit params pump_gg1

[oracle@gg1 dirprm]$ cat pump_gg1.prm
extract pump_gg1
passthru
userid gggate, password oracle
rmthost gg2.company.com, mgrport 7809
rmttrail /u01/app/GGS/dirdat/rt
[oracle@gg1 dirprm]$ pwd
/u01/app/GGS/dirprm

Target system:
==============
1. On target database, we need to configure Replicat process which will apply the changes
to target database after reading from remote trail file.

Syntax:

ADD REPLICAT <REPLICAT name>, EXTTRAIL <trail file path/two -- character trail id> Name of the replicat process.

GGSCI (gg2.company.com) 8> add replicat rep_gg2, exttrail /u01/app/GGS/dirdat/rt
ERROR: No checkpoint table specified for ADD REPLICAT.

2. Add checkpoint table.

A checkpoint table is a small table in the target database used by the Replicat
process to maintain checkpoints. A checkpoint records a known position in the trail from which
to start after an expected or unexpected shutdown. By default, a record of these checkpoints is maintained
in a file on disk in the GoldenGate directory. Optionally, the checkpoint record can also be maintained in a
checkpoint table in the target database.

More Information about check point table can be found in MOS note: 965698.1

GGSCI (gg2.company.com) 10>  add checkpointtable chktbl

Successfully created checkpoint table CHKTBL.

GGSCI (gg2.company.com) 13>  add replicat rep_gg2, exttrail /u01/app/GGS/dirdat/rt, checkpointtable gggate.chktbl
REPLICAT added.

GGSCI (gg2.company.com) 14> info *

REPLICAT   REP_GG2   Initialized   2012-06-10 16:58   Status STOPPED
Checkpoint Lag       00:00:00 (updated 00:00:12 ago)
Log Read Checkpoint  File /u01/app/GGS/dirdat/rt000000
First Record  RBA 0

Now start all the processes on Source system.

GGSCI (gg1.company.com) 11> start manager

Manager started.
GGSCI (gg1.company.com) 12> start extract ext_gg1

Sending START request to MANAGER ...
EXTRACT EXT_GG1 starting
GGSCI (gg1.company.com) 13> start extract pump_gg1

Sending START request to MANAGER ...
EXTRACT PUMP_GG1 starting
GGSCI (gg1.company.com) 15> info *

EXTRACT    EXT_GG1   Last Started 2012-06-10 17:02   Status RUNNING
Checkpoint Lag       00:00:00 (updated 00:00:00 ago)
Log Read Checkpoint  Oracle Redo Logs
2012-06-10 17:02:20  Seqno 108, RBA 14447104

EXTRACT    PUMP_GG1  Last Started 2012-06-10 17:02   Status RUNNING
Checkpoint Lag       00:00:00 (updated 00:30:18 ago)
Log Read Checkpoint  File /u01/app/GGS/dirdat/lt000000
First Record  RBA 0

[oracle@gg1 GGS]$ ls -ltr /u01/app/GGS/dirdat/lt000000
-rw-rw-rw- 1 oracle oinstall 983 Jun 10 17:02 /u01/app/GGS/dirdat/lt000000

GGSCI (gg1.company.com) 2> exit

Once the processes are strated, we will create the sample tables for replication using Oracle provided scripts.
These scripts can be found under /u01/app/GGS

[oracle@gg1 GGS]$ ls -ltr *demo*
-r--r--r-- 1 oracle oinstall 2275 Oct 15  2010 demo_ora_misc.sql
-r--r--r-- 1 oracle oinstall 4015 Oct 15  2010 demo_ora_lob_create.sql
-r--r--r-- 1 oracle oinstall  821 Oct 15  2010 demo_ora_insert.sql
-r--r--r-- 1 oracle oinstall  883 Oct 15  2010 demo_ora_create.sql
-r--r--r-- 1 oracle oinstall 2520 Oct 15  2010 demo_ora_pk_befores_updates.sql
-r--r--r-- 1 oracle oinstall 1227 Oct 15  2010 demo_ora_pk_befores_insert.sql
-r--r--r-- 1 oracle oinstall 1269 Oct 15  2010 demo_ora_pk_befores_create.sql
-r--r--r-- 1 oracle oinstall  967 Oct 15  2010 demo_more_ora_insert.sql
-r--r--r-- 1 oracle oinstall 1217 Oct 15  2010 demo_more_ora_create.sql

we will create tables for mapping:

Source:
======

SQL> connect scott/tiger
Connected.
SQL> @demo_ora_create.sql
DROP TABLE tcustmer
*
ERROR at line 1:
ORA-00942: table or view does not exist

Table created.

DROP TABLE tcustord
*
ERROR at line 1:
ORA-00942: table or view does not exist

Table created.

SQL>

Target:
=======

SQL> conn scott/tiger
Connected.
SQL> @demo_ora_create.sql
DROP TABLE tcustmer
*
ERROR at line 1:
ORA-00942: table or view does not exist

Table created.

DROP TABLE tcustord
*
ERROR at line 1:
ORA-00942: table or view does not exist

Table created.

SQL> select count(*) from tcustmer;

COUNT(*)
----------
0

SQL>

Once the structure is created, we will tell the extract, pump and replicat process
to capture/apply the changes done under these tables. For that we need to configure
the respective parameter files and these should look like:

GGSCI (gg1.company.com) 13> edit params ext_gg1

extract ext_gg1
userid gggate, password oracle
exttrail /u01/app/GGS/dirdat/lt
Table SCOTT.TCUSTMER;
Table SCOTT.TCUSTORD;


GGSCI (gg1.company.com) 14> edit params pump_gg1
extract pump_gg1
passthru
userid gggate, password oracle
rmthost gg2.company.com, mgrport 7809
rmttrail /u01/app/GGS/dirdat/rt
Table SCOTT.TCUSTMER;
Table SCOTT.TCUSTORD;

GGSCI (gg2.company.com) 4> edit params rep_gg2
REPLICAT rep_gg2
ASSUMETARGETDEFS
userid gggate, password oracle
discardfile /u01/app/GGS/discard/rep_gg2_discard.txt, append, megabytes 10
MAP SCOTT.TCUSTMER, Target SCOTT.TCUSTMER;
MAP SCOTT.TCUSTORD, Target SCOTT.TCUSTORD;

Once the parameter files are updated with the table information, we can use the
Oracle provided scripts to test the replication i.e “demo_ora_insert.sql”

After running the insert on source, we can see the count(*) on both source and target,
which will confirm the changes applied at the target side.

Copying Oracle Scheduler jobs with Arguments

Oracle provides dbms_scheduler.copy_job function to copy oracle scheduler jobs which works great. But problem is that it cannot copy  jobs which have  arguments.

Need to copy jobs can arise due to following scenarios

1)Runaway jobs – When Scheduler job is running without any SID. This means OS session is not present and this will not clear. These jobs have to be recreated to resolve the issue.
2)Jobs can not be stopped – many times jobs cannot be stopped by dbms_scheduler.stop_job(<jobname>,true) as they error out with “Internal error”
3)We need to have similar job but under different schema or with different name

I have written below code which would prompt for username and jobname and will copy the job along with its arguments

declare
l_owner varchar2(30) :='&job_owner';
l_old_job varchar2(30) :='&job_name';
l_new_job varchar2(30);
cnt integer;
cursor c_arg is select argument_position,value from DBA_SCHEDULER_JOB_ARGS where job_name = l_old_job and owner = l_owner order by argument_position;
begin
l_new_job :='INTERMEDIATE_CPY';
select count(*) into cnt from dba_scheduler_jobs where job_name = l_new_job and owner = l_owner;
		if ( cnt > 0 ) then
			dbms_output.put_line ('dropping scheduler job INTERMEDIATE_CPY');
			 dbms_scheduler.drop_job(l_owner||'.'||l_new_job,true);
			end if;
sys.dbms_scheduler.copy_job(l_owner||'.'||l_old_job,l_owner||'.'||l_new_job);
sys.dbms_scheduler.drop_job(l_owner||'.'||l_old_job,true);
sys.dbms_scheduler.copy_job(l_owner||'.'||l_new_job,l_owner||'.'||l_old_job);
for v_arg in c_arg
	loop
		dbms_output.put_line('Setting ARGUMENT_POSITION '||v_arg.argument_position||' to value '||v_arg.value);
	 sys.dbms_scheduler.set_job_argument_value(job_name =>l_owner||'.'||l_old_job,ARGUMENT_POSITION=>v_arg.argument_position,ARGUMENT_VALUE =>v_arg.value);
	 end loop;
	 sys.dbms_scheduler.enable(l_owner||'.'||l_old_job);
	sys.dbms_scheduler.drop_job(l_owner||'.'||l_new_job,true);
end;
/

In case you don’t copy the arguments, jobs will fail with status as STOPPED and give REASON as “Job slave process was terminated”. Also it can generate following ORA-07445 error

ORA-07445: exception encountered: core dump [kpubsuuc()+197] [SIGSEGV] [Address not mapped to object] [0x000000000] [] []