Recently we came across a issue where our DB crashed with ORA-00338 error .
Errors in file /oracle/diag/rdbms/orcl11g/orc11g/trace/orc11g_lgwr_24118.trc: ORA-00338: log 2 of thread 1 is more recent than control file ORA-00312: online log 2 thread 1: '/oracle/oradata/orcl11g/redo02.log' LGWR (ospid: 24118): terminating the instance due to error 338
DB couldn’t be restarted as it gave same errors while opening. For multiplexed redo log files, it reports error for both log files.
Error Description: (Reference :http://psoug.org/oraerror/ORA-00338.htm)
Log string of thread string is more recent than control file
Error Cause:
The control file change sequence number in the log file is greater than the number in the control file. This implies that the wrong control file is being used. Note that repeatedly causing this error can make it stop happening without correcting the real problem. Every attempt to open the database will advance the control file change sequence number until it is great enough.
Action:
Use the current control file or do backup control file recovery to make the control file current. Be sure to follow all restrictions on doing a backup control file recovery.
Above explanation suggests that there is problem with controlfile. Normally these errors are seen when doing incomplete recovery. To troubleshoot it ,we took dumps of redo log files (On recommendation of Oracle support)
Note: Below logs are from test system which was used to reproduce the issue
SQL> alter system dump logfile '/oracle/oradata/orcl11g/redo01.log' validate; System altered. SQL> alter system dump logfile '/oracle/oradata/orcl11g/redo02.log' validate; ERROR at line 1: ORA-00339: archived log does not contain any redo ORA-00334: archived log: '/oracle/oradata/orcl11g/redo02.log' SQL> alter system dump logfile '/oracle/oradata/orcl11g/redo03.log' validate; * ERROR at line 1: ORA-00339: archived log does not contain any redo ORA-00334: archived log: '/oracle/oradata/orcl11g/redo03.log'
redo02.log and redo03.log dump failed with errors that it does not contain redo. Since redo01.log dump was successful, we looked at trace.
DUMP OF REDO FROM FILE '/oracle/oradata/orcl11g/redo01.log' Opcodes *.* RBAs: 0x000000.00000000.0000 thru 0xffffffff.ffffffff.ffff SCNs: scn: 0x0000.00000000 thru scn: 0xffff.ffffffff Times: creation thru eternity VALIDATE ONLY FILE HEADER: Compatibility Vsn = 186647552=0xb200400 Db ID=970369526=0x39d6a9f6, Db Name='TESTDB' Activation ID=2650290266=0x9df8385a Control Seq=5124=0x1404, File size=102400=0x19000 File Number=1, Blksiz=512, File Type=2 LOG descrip:"Thread 0001, Seq# 0000000001, SCN 0x00000016f528-0xffffffffffff" thread: 1 nab: 0xffffffff seq: 0x00000001 hws: 0x3 eot: 1 dis: 0 resetlogs count: 0x33acc28a scn: 0x0000.0016f528 (1504552) prev resetlogs count: 0x3377bd37 scn: 0x0000.000e2006 (925702) Low scn: 0x0000.0016f528 (1504552) 12/22/2014 06:13:30 Next scn: 0xffff.ffffffff 01/01/1988 00:00:00 Enabled scn: 0x0000.0016f528 (1504552) 12/22/2014 06:13:30 Thread closed scn: 0x0000.0016f528 (1504552) 12/22/2014 06:13:30 Disk cksum: 0xcec6 Calc cksum: 0xcec6 Terminal recovery stop scn: 0x0000.00000000 Terminal recovery 01/01/1988 00:00:00 Most recent redo scn: 0x0000.00000000 Largest LWN: 0 blocks End-of-redo stream : No Unprotected mode Miscellaneous flags: 0x800000 Thread internal enable indicator: thr: 0, seq: 0 scn: 0x0000.00000000 Zero blocks: 0 Format ID is 2 redo log key is 1679de3ad36cdd2684143daaa1635b8 redo log key flag is 5 Enabled redo threads: 1 END OF REDO DUMP ----- Redo read statistics for thread 1 -----
If you look at dumpfile it says that sequence is set to 1 and DB name=”TESTDB’ .Our instance name is orc11g and last sequence# was more than 1 (can also be confirmed from v$log). This indicated that our redo logs were overwritten by some other process. DB name in redo log hinted that it was done by reporting clone refresh process. Problem was that redo log volume was cross mounted on a reporting clone which overwrote the redo logs during the refresh process .
As a fix, new volume was provisioned for the clone redo logs and volume export was revoked. Since current redo log also got overwritten, we had to restore last hot backup and perform incomplete recovery till the last ETL start time (This was a datawarehouse db). In OLTP this would have caused data loss.
This issue is easily reproducible. If you try to clone a instance by using same name for redo log and open the database, you will receive the error in your source database. DB files are protected by DBWR so we are protected from risk of other Database opening datafiles and get ORA-01157
ORA-01157: cannot identify/lock data file 3 - see DBWR trace file ORA-01110: data file 3: '/oracle/oradata/orcl11g/undotbs01.dbf'
Nice blog.
Hi Amit,
It’s an good post buddy.. !!
One question was popping out of my brain, why the ORL’s are placed on nfs mounts, I hope it would be better to replace to local file system if it’s non-ASM. Since, the performance and furture issues will get minimized.
Pavan,
This is 60Tb warehouse so we are using netapp filers for database storage which also provides us facility of snapshot backups.
Regards
Amit