corruption

LGWR terminating instance due to error 338

Recently we came across a issue where our DB crashed with ORA-00338 error .

Errors in file /oracle/diag/rdbms/orcl11g/orc11g/trace/orc11g_lgwr_24118.trc:
ORA-00338: log 2 of thread 1 is more recent than control file
ORA-00312: online log 2 thread 1: '/oracle/oradata/orcl11g/redo02.log'
LGWR (ospid: 24118): terminating the instance due to error 338

DB couldn’t be restarted as it gave same errors while opening. For multiplexed redo log files, it reports error for both log files.

Error Description: (Reference :http://psoug.org/oraerror/ORA-00338.htm)
Log string of thread string is more recent than control file

Error Cause:

The control file change sequence number in the log file is greater than the number in the control file. This implies that the wrong control file is being used. Note that repeatedly causing this error can make it stop happening without correcting the real problem. Every attempt to open the database will advance the control file change sequence number until it is great enough.

Action:
Use the current control file or do backup control file recovery to make the control file current. Be sure to follow all restrictions on doing a backup control file recovery.

Above explanation suggests that there is problem with controlfile. Normally these errors are seen when doing incomplete recovery. To troubleshoot it ,we took dumps of redo log files (On recommendation of Oracle support)

Note: Below logs are from test system which was used to reproduce the issue

SQL> alter system dump logfile '/oracle/oradata/orcl11g/redo01.log' validate;
System altered.
SQL> alter system dump logfile '/oracle/oradata/orcl11g/redo02.log' validate;
ERROR at line 1:
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '/oracle/oradata/orcl11g/redo02.log'
SQL> alter system dump logfile '/oracle/oradata/orcl11g/redo03.log' validate;
*
ERROR at line 1:
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '/oracle/oradata/orcl11g/redo03.log'

redo02.log and redo03.log dump failed with errors that it does not contain redo. Since redo01.log dump was successful, we looked at trace.

DUMP OF REDO FROM FILE '/oracle/oradata/orcl11g/redo01.log'
 Opcodes *.*
 RBAs: 0x000000.00000000.0000 thru 0xffffffff.ffffffff.ffff
 SCNs: scn: 0x0000.00000000 thru scn: 0xffff.ffffffff
 Times: creation thru eternity
 VALIDATE ONLY
 FILE HEADER:
 Compatibility Vsn = 186647552=0xb200400
 Db ID=970369526=0x39d6a9f6, Db Name='TESTDB'
 Activation ID=2650290266=0x9df8385a
 Control Seq=5124=0x1404, File size=102400=0x19000
 File Number=1, Blksiz=512, File Type=2 LOG
 descrip:"Thread 0001, Seq# 0000000001, SCN 0x00000016f528-0xffffffffffff"
 thread: 1 nab: 0xffffffff seq: 0x00000001 hws: 0x3 eot: 1 dis: 0
 resetlogs count: 0x33acc28a scn: 0x0000.0016f528 (1504552)
 prev resetlogs count: 0x3377bd37 scn: 0x0000.000e2006 (925702)
 Low scn: 0x0000.0016f528 (1504552) 12/22/2014 06:13:30
 Next scn: 0xffff.ffffffff 01/01/1988 00:00:00
 Enabled scn: 0x0000.0016f528 (1504552) 12/22/2014 06:13:30
 Thread closed scn: 0x0000.0016f528 (1504552) 12/22/2014 06:13:30
 Disk cksum: 0xcec6 Calc cksum: 0xcec6
 Terminal recovery stop scn: 0x0000.00000000
 Terminal recovery 01/01/1988 00:00:00
 Most recent redo scn: 0x0000.00000000
 Largest LWN: 0 blocks
 End-of-redo stream : No
 Unprotected mode
 Miscellaneous flags: 0x800000
 Thread internal enable indicator: thr: 0, seq: 0 scn: 0x0000.00000000
 Zero blocks: 0
 Format ID is 2
 redo log key is 1679de3ad36cdd2684143daaa1635b8
 redo log key flag is 5
 Enabled redo threads: 1
END OF REDO DUMP
----- Redo read statistics for thread 1 -----

If you look at dumpfile it says that sequence is set to 1 and DB name=”TESTDB’ .Our instance name is orc11g and last sequence# was more than 1 (can also be confirmed from v$log). This indicated that our redo logs were overwritten by some other process. DB name in redo log hinted that it was done by reporting clone refresh process. Problem was that redo log volume was cross mounted on a reporting clone which overwrote the redo logs during the refresh process .

As a fix, new volume was provisioned for the clone redo logs and volume export was revoked. Since current redo log also got overwritten, we had to restore last hot backup and perform incomplete recovery till the last ETL start time (This was a datawarehouse db). In OLTP this would have caused data loss.

This issue is easily reproducible. If you try to clone a instance by using same name for redo log and open the database, you will receive the error in your source database. DB files are protected by DBWR so we are protected from risk of other Database opening datafiles and get ORA-01157

ORA-01157: cannot identify/lock data file 3 - see DBWR trace file
ORA-01110: data file 3: '/oracle/oradata/orcl11g/undotbs01.dbf'

11gR2:Oracle Online Patching

Online patching is new feature introduced in 11.1.0.6 and oracle started delivering patches with 11.2.0.2.

This patch allows you to apply patch to live database i.e we do not need to shutdown database. This feature is only available for RDBMS home and can be installed/enabled using opatch.

RDBMS Online Patching Aka Hot Patching [ID 761111.1] explains internals of how this patching works.

I tested it while working on Database corruption issue caused by Bug 10205230 ORA-600 / corruption possible during shutdown in RAC. This bug is present in 11.2.0.2 database which causes corruption when you try to use shutdown immediate/normal/transactional option to shutdown RAC database. As per MOS note 1318986.1, Data corruption occurs around shutdown one or more of the RAC instances

* One of the following ORA-600 asserts:
– ORA-600 [kclchkblk_3]
– ORA-600 [kclwcrs_6]
– ORA-600 [ktubko_1]
– ORA-600 [kcratr_scan_lostwrt]
– ORA-600[3020] on the standby database

Workaround is to perform local check point and do shutdown abort. Patch 10205230 was available for our platform and I downloaded using below command directly to server

read mos_user_id
read -s h_passwd
wget --http-user="$mos_user_id" --http-password="$h_passwd" "https://updates.oracle.com/Orion/Services/download/p10205230_112020_Linux-x86-64.zip?aru=13671866&patch_file=p10205230_112020_Linux-x86-64.zip" -O p10205230_112020_Linux-x86-64.zip

mos_user_id – Metalink aka Mos user id (email id)
h_passwd – Password

To query if patch is online, go to patch directory and run following

 

$opatch query -all online
Invoking OPatch 11.2.0.1.6
Oracle Interim Patch Installer version 11.2.0.1.6
Copyright (c) 2011, Oracle Corporation. All rights reserved.

Oracle Home : /home/oracle/product/11.2
Central Inventory : /home/oracle/oraInventory
 from : /etc/oraInst.loc
OPatch version : 11.2.0.1.6
OUI version : 11.2.0.2.0
Log file location : /home/oracle/product/11.2/cfgtoollogs/opatch/opatch2012-06-20_16-43-21PM.log
--------------------------------------------------------------------------------
 Patch created on 2 Apr 2011, 12:57:14 hrs PST8PDT
 Need to shutdown Oracle instances: false
 Patch is roll-backable: true
 Patch is a "Patchset Update": false
 Patch is a rolling patch: true
 Patch has sql related actions: false
 Patch is an online patch: true
 Patch is a portal patch: false
 Patch is an "auto-enabled" patch: false
List of platforms supported:
 226: Linux x86-64
List of bugs to be fixed:
 10205230: TB_X64: HIT ORA-00600: [KCLWCRS_6]
This patch is a "singleton" patch.
This patch belongs to the "db" product family
List of executables affected:
 ORACLE_HOME/bin/oracle
List of optional components:
 oracle.rdbms: 11.2.0.2.0
List of optional actions:
 Patch the Database instances with Online Patch hpatch/bug10205230.pch
Possible XML representation of the patch:
<ONEOFF REF_ID="10205230" ROLLBACK="T" XML_INV_LOC="oneoffs/10205230/" ACT_INST_VER="11.2.0.2.0" INSTALL_TIME="2012.Jun.20 16:43:21 PDT">
 <DESC></DESC>
 <REF_LIST>
 <REF NAME="oracle.rdbms" VER="11.2.0.2.0" HOME_IDX="0"/>
 </REF_LIST>
 <BUG_LIST>
 <BUG>10205230</BUG>
 </BUG_LIST>
 <FILE_LIST/>
</ONEOFF>
--------------------------------------------------------------------------------
OPatch succeeded.

Patch is an online patch: true” indicates patch can be applied to live db.

To apply the patch, we can use following syntax

Non – RAC => opatch apply online -connectString <SID>:<USERNAME>:<PASSWORD>:<NODE>
RAC => opatch apply online -connectString <SID>:<USERNAME>:<PASSWORD>:<NODE1>,<SID2>:<USERNAME>:<PASSWORD>:<NODE2>,…

Since we are patching RAC database, we will use second syntax

$ opatch apply online -connectString orcl01d1:sys:oracle123:prod03,orcl01d2:sys:oracle123:prod04
Invoking OPatch 11.2.0.1.6
Oracle Interim Patch Installer version 11.2.0.1.6
Copyright (c) 2011, Oracle Corporation. All rights reserved.

Oracle Home : /home/oracle/product/11.2
Central Inventory : /home/oracle/oraInventory
 from : /etc/oraInst.loc
OPatch version : 11.2.0.1.6
OUI version : 11.2.0.2.0
Log file location : /home/oracle/product/11.2/cfgtoollogs/opatch/opatch2012-06-20_16-53-36PM.log

The patch should be applied/rolled back in '-all_nodes' mode only.
Converting the RAC mode to '-all_nodes' mode.
Applying interim patch '10205230' to OH '/home/oracle/product/11.2'
Verifying environment and performing prerequisite checks...
Do you want to proceed? [y|n]
y
User Responded with: Y
All checks passed.
Backing up files...
Patching component oracle.rdbms, 11.2.0.2.0...
The patch will be installed on active database instances.
Installing and enabling the online patch 'bug10205230.pch', on database 'orcl01d1'.

Patching in all-node mode.
Updating nodes 'prod04' 
 Apply-related files are:
 FP = "/home/oracle/product/11.2/.patch_storage/10205230_Apr_2_2011_12_57_14/rac/copy_files.txt"
 DP = "/home/oracle/product/11.2/.patch_storage/10205230_Apr_2_2011_12_57_14/rac/copy_dirs.txt"
 MP = "/home/oracle/product/11.2/.patch_storage/10205230_Apr_2_2011_12_57_14/rac/make_cmds.txt"
 RC = "/home/oracle/product/11.2/.patch_storage/10205230_Apr_2_2011_12_57_14/rac/remote_cmds.txt"
Instantiating the file "/home/oracle/product/11.2/.patch_storage/10205230_Apr_2_2011_12_57_14/rac/copy_files.txt.instantiated" by replacing $ORACLE_HOME in "/home/oracle/product/11.2/.patch_storage/10205230_Apr_2_2011_12_57_14/rac/copy_files.txt" with actual path.
Propagating files to remote nodes...
Instantiating the file "/home/oracle/product/11.2/.patch_storage/10205230_Apr_2_2011_12_57_14/rac/copy_dirs.txt.instantiated" by replacing $ORACLE_HOME in "/home/oracle/product/11.2/.patch_storage/10205230_Apr_2_2011_12_57_14/rac/copy_dirs.txt" with actual path.
Propagating directories to remote nodes...
Installing and enabling the online patch 'bug10205230.pch', on database 'orcl01d2' on node 'prod04'.
Patch 10205230 successfully applied
Log file location: /home/oracle/product/11.2/cfgtoollogs/opatch/opatch2012-06-20_16-53-36PM.log
OPatch succeeded.

You can verify that patch is applied and enabled

SQL> oradebug patch list
Patch File Name State
================ =========
bug10205230.pch ENABLED

If you have multiple ORACLE databases running out of same RDBMS Home, you can enable patch for other database using enableonlinepatch option

$ opatch util enableonlinepatch -connectString orcl02d1:sys:oracle123:prod03,orcl02d2:sys:oracle123:prod04 -id 10205230

In case of single instance home, below syntax is used

$ opatch util enableonlinepatch -connectString orcl:sys:oracle123 -id 10205230

Alert log is updated with following entries during patch application

Patch bug10205230.pch Installed - Update #1
Patch bug10205230.pch Enabled - Update #2
Thu Jun 21 02:13:17 2012
Online patch bug10205230.pch has been installed
Online patch bug10205230.pch has been enabled

Online patching is very important feature introduced in 11.2.0.2 as this allows you to apply patches without any downtime. Note that if you apply patch using normal opatch apply, it will not be enabled. You can confirm this by oradebug patch list command.

Physical Corruption: ORA-1578 part 3

Recently we encountered a block corruption issue in 10g database on Linux x86 64 bit and using ASM for storing database files.
Saurabh had earlier written articles on ora-1578.

http://askdba.org/weblog/2008/04/physical-corruption-ora-1578-part-1/
http://askdba.org/weblog/2008/05/physical-corruption-ora-1578-part-2/

It was first time I was working on a corruption issue, so I followed above notes and found them quite useful. Good thing about ora-1578 error message is that it gives the problematic block and datafile number

ORA-01578: ORACLE data block corrupted (file # 699, block # 188654)
ORA-01110: data file 699:
'+DG_IDX/james/datafile/scott_emp_detail.381.705692005'

Querying dba_extents revealed that it was a Fact table and one partition was giving error.Since we had rman backup for this database we directly went for recovery using blockrecover command

Recovery Manager: Release 10.2.0.4.0 - Production on Sat Aug 7 01:48:51 2010

Copyright (c) 1982, 2007, Oracle.  All rights reserved.

connected to target database: james (DBID=1812247572)

RMAN> run {blockrecover datafile 699 block 188654;}

Starting blockrecover at 07-AUG-10
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=2231 instance=james4 devtype=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: sid=2190 instance=james4 devtype=DISK
allocated channel: ORA_DISK_3
channel ORA_DISK_3: sid=2142 instance=james4 devtype=DISK
allocated channel: ORA_DISK_4
channel ORA_DISK_4: sid=2186 instance=james4 devtype=DISK
allocated channel: ORA_DISK_5
channel ORA_DISK_5: sid=2170 instance=james4 devtype=DISK
allocated channel: ORA_DISK_6
channel ORA_DISK_6: sid=2203 instance=james4 devtype=DISK
allocated channel: ORA_DISK_7
channel ORA_DISK_7: sid=2240 instance=james4 devtype=DISK
allocated channel: ORA_DISK_8
channel ORA_DISK_8: sid=2195 instance=james4 devtype=DISK

channel ORA_DISK_2: restoring block(s)
channel ORA_DISK_2: specifying block(s) to restore from backup set
restoring blocks of datafile 00699
channel ORA_DISK_2: reading from backup piece /RMNGD_RMAN1/backup/james/29lk4ift_1_1

user interrupt received
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of blockrecover command at 08/07/2010 02:52:27
RMAN-03099: job cancelled at user request

We had to cancel the block recovery as even after 1 hour it could not locate the block from backup piece. Checking backup piece found that this backup piece was of 184G

ls -lh /RMNGD_RMAN1/backup/james/29lk4ift_1_1
-rw-r-----  1 oracle dba 184G Jul 31 18:17 /RMNGD_RMAN1/backup/james/29lk4ift_1_1

We had stopped the data load for this table but delaying further was not possible as it was impacting application. Finally we  decided to use dbms_repair table to allow DML to mark the block as corrupt .
This table was FACT table which could have been loaded again (lucky for us 🙂 ). Before marking it corrupt, we took a dump of block

ALTER SYSTEM DUMP DATAFILE 699 block 188654;
BEGIN
DBMS_REPAIR.ADMIN_TABLES (
     TABLE_NAME => 'REPAIR_TABLE',
     TABLE_TYPE => dbms_repair.repair_table,
     ACTION     => dbms_repair.create_action,
     TABLESPACE => 'USERS');
END;
/

SET SERVEROUTPUT ON
DECLARE num_corrupt INT;
BEGIN
num_corrupt := 0;
DBMS_REPAIR.CHECK_OBJECT (
     SCHEMA_NAME => 'SCOTT',
     OBJECT_NAME => 'EMP_DETAIL_FACT',
     PARTITION_NAME => 'SYS_P1140013',
     REPAIR_TABLE_NAME => 'REPAIR_TABLE',
     CORRUPT_COUNT =>  num_corrupt);
DBMS_OUTPUT.PUT_LINE('number corrupt: ' || TO_CHAR (num_corrupt));
END;
/

SELECT OBJECT_NAME, BLOCK_ID, CORRUPT_TYPE, MARKED_CORRUPT,CORRUPT_DESCRIPTION, REPAIR_DESCRIPTION FROM REPAIR_TABLE;

  SET SERVEROUTPUT ON
  DECLARE num_fix INT;
  BEGIN
  num_fix := 0;
  DBMS_REPAIR.FIX_CORRUPT_BLOCKS (
      SCHEMA_NAME => 'SCOTT',
      OBJECT_NAME => 'EMP_DETAIL_FACT',
     PARTITION_NAME => 'SYS_P1140013',
      OBJECT_TYPE => dbms_repair.table_object,
      REPAIR_TABLE_NAME => 'REPAIR_TABLE',
      FIX_COUNT=> num_fix);
  DBMS_OUTPUT.PUT_LINE('num fix: ' || TO_CHAR(num_fix));
  END;
  /

  execute DBMS_REPAIR.SKIP_CORRUPT_BLOCKS('SCOTT','EMP_DETAIL_FACT')

Refer 10g doc to get details on DBMS_REPAIR.

This allowed the application to load data again. Later we worked with application team and created a new segment and renamed the old one to *_bkp and then dropped it.
Next day we got a alert informing errors with our rman backup

ORA-19566: exceeded limit of 0 corrupt blocks for file +DG_IDX/james/datafile/scott_emp_detail.381.705692005
continuing other job steps, job failed will not be re-run
channel ORA_DISK_8: finished piece 1 at AUG-10-2010 01:56:45
piece handle=/RMNGD_RMAN3/backup/james/i2lkuv0q_1_1 tag=james_INC_20100810 comment=NONE
channel ORA_DISK_8: backup set complete, elapsed time: 00:26:43
channel ORA_DISK_1: finished piece 1 at AUG-10-2010 02:13:01
piece handle=/RMNGD_RMAN1/backup/james/hrlkuv0i_1_1 tag=james_INC_20100810 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:43:07
channel ORA_DISK_6: finished piece 1 at AUG-10-2010 02:21:26
piece handle=/RMNGD_RMAN2/backup/james/i0lkuv0n_1_1 tag=james_INC_20100810 comment=NONE
channel ORA_DISK_6: backup set complete, elapsed time: 00:51:27
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================

Backup job failed again as the table was still lying in recyclebin.

03:16:47 SQL> select * from dba_recyclebin where original_name='EMP_DETAIL_FACT_BKP';

SCOTT                       BIN$jXvLFhsgCOXgQA1M6jspCQ==03:16:47 SQL> select * from dba_recyclebin where original_name=\'EMP_DETAIL_FACT_BKP\';

SCOTT                       BIN$jXvLFhsgCOXgQA1M6jspCQ==03:16:47 SQL> select * from dba_recyclebin where original_name='EMP_DETAIL_FACT_BKP';

SCOTT                       BIN$jXvLFhsgCOXgQA1M6jspCQ==03:16:47 SQL> select * from dba_recyclebin where original_name='EMP_DETAIL_FACT_BKP';

SCOTT                       BIN$jXvLFhsgCOXgQA1M6jspCQ==$0 EMP_DETAIL_FACT_BKP
DROP                            Table Partition
scott_emp_detail                                                                                                      2009-11-09:18:41:14
2010-08-10:09:40:39                                                    9.9476E+12                                                                                                                                            NO        NO               4465594     4465594   4465594
      1088

SCOTT                       BIN$jXvLFhsgCOXgQA1M6jspCQ==$0 EMP_DETAIL_FACT_BKP
DROP                            Table Partition
scott_emp_detail                                                                                                      2009-11-09:18:41:14
2010-08-10:09:40:39                                                    9.9476E+12
                                                                                                                                               NO        NO               4465594     4465594   4465594
     12896 EMP_DETAIL_FACT_BKP
DROP                            Table Partition
scott_emp_detail                                                                                                      2009-11-09:18:41:14
2010-08-10:09:40:39                                                    9.9476E+12                                                                                                                                            NO        NO               4465594     4465594   4465594
      1088

SCOTT                       BIN$jXvLFhsgCOXgQA1M6jspCQ==03:16:47 SQL> select * from dba_recyclebin where original_name='EMP_DETAIL_FACT_BKP';

SCOTT                       BIN$jXvLFhsgCOXgQA1M6jspCQ==$0 EMP_DETAIL_FACT_BKP
DROP                            Table Partition
scott_emp_detail                                                                                                      2009-11-09:18:41:14
2010-08-10:09:40:39                                                    9.9476E+12                                                                                                                                            NO        NO               4465594     4465594   4465594
      1088

SCOTT                       BIN$jXvLFhsgCOXgQA1M6jspCQ==$0 EMP_DETAIL_FACT_BKP
DROP                            Table Partition
scott_emp_detail                                                                                                      2009-11-09:18:41:14
2010-08-10:09:40:39                                                    9.9476E+12
                                                                                                                                               NO        NO               4465594     4465594   4465594
     12896 EMP_DETAIL_FACT_BKP
DROP                            Table Partition
scott_emp_detail                                                                                                      2009-11-09:18:41:14
2010-08-10:09:40:39                                                    9.9476E+12
                                                                                                                                               NO        NO               4465594     4465594   4465594
     12896 EMP_DETAIL_FACT_BKP
DROP                            Table Partition
scott_emp_detail                                                                                                      2009-11-09:18:41:14
2010-08-10:09:40:39                                                    9.9476E+12                                                                                                                                            NO        NO               4465594     4465594   4465594
      1088

SCOTT                       BIN$jXvLFhsgCOXgQA1M6jspCQ==03:16:47 SQL> select * from dba_recyclebin where original_name='EMP_DETAIL_FACT_BKP';

SCOTT                       BIN$jXvLFhsgCOXgQA1M6jspCQ==03:16:47 SQL> select * from dba_recyclebin where original_name='EMP_DETAIL_FACT_BKP';

SCOTT                       BIN$jXvLFhsgCOXgQA1M6jspCQ==$0 EMP_DETAIL_FACT_BKP
DROP                            Table Partition
scott_emp_detail                                                                                                      2009-11-09:18:41:14
2010-08-10:09:40:39                                                    9.9476E+12                                                                                                                                            NO        NO               4465594     4465594   4465594
      1088

SCOTT                       BIN$jXvLFhsgCOXgQA1M6jspCQ==$0 EMP_DETAIL_FACT_BKP
DROP                            Table Partition
scott_emp_detail                                                                                                      2009-11-09:18:41:14
2010-08-10:09:40:39                                                    9.9476E+12
                                                                                                                                               NO        NO               4465594     4465594   4465594
     12896 EMP_DETAIL_FACT_BKP
DROP                            Table Partition
scott_emp_detail                                                                                                      2009-11-09:18:41:14
2010-08-10:09:40:39                                                    9.9476E+12                                                                                                                                            NO        NO               4465594     4465594   4465594
      1088

SCOTT                       BIN$jXvLFhsgCOXgQA1M6jspCQ==03:16:47 SQL> select * from dba_recyclebin where original_name='EMP_DETAIL_FACT_BKP';

SCOTT                       BIN$jXvLFhsgCOXgQA1M6jspCQ==$0 EMP_DETAIL_FACT_BKP
DROP                            Table Partition
scott_emp_detail                                                                                                      2009-11-09:18:41:14
2010-08-10:09:40:39                                                    9.9476E+12                                                                                                                                            NO        NO               4465594     4465594   4465594
      1088

SCOTT                       BIN$jXvLFhsgCOXgQA1M6jspCQ==$0 EMP_DETAIL_FACT_BKP
DROP                            Table Partition
scott_emp_detail                                                                                                      2009-11-09:18:41:14
2010-08-10:09:40:39                                                    9.9476E+12
                                                                                                                                               NO        NO               4465594     4465594   4465594
     12896 EMP_DETAIL_FACT_BKP
DROP                            Table Partition
scott_emp_detail                                                                                                      2009-11-09:18:41:14
2010-08-10:09:40:39                                                    9.9476E+12
                                                                                                                                               NO        NO               4465594     4465594   4465594
     12896 EMP_DETAIL_FACT_BKP
DROP                            Table Partition
scott_emp_detail                                                                                                      2009-11-09:18:41:14
2010-08-10:09:40:39                                                    9.9476E+12
                                                                                                                                               NO        NO               4465594     4465594   4465594
     12896 EMP_DETAIL_FACT_BKP
DROP                            Table Partition
scott_emp_detail                                                                                                      2009-11-09:18:41:14
2010-08-10:09:40:39                                                    9.9476E+12                                                                                                                                            NO        NO               4465594     4465594   4465594
      1088

SCOTT                       BIN$jXvLFhsgCOXgQA1M6jspCQ==03:16:47 SQL> select * from dba_recyclebin where original_name=\'EMP_DETAIL_FACT_BKP\';

SCOTT                       BIN$jXvLFhsgCOXgQA1M6jspCQ==03:16:47 SQL> select * from dba_recyclebin where original_name='EMP_DETAIL_FACT_BKP';

SCOTT                       BIN$jXvLFhsgCOXgQA1M6jspCQ==03:16:47 SQL> select * from dba_recyclebin where original_name='EMP_DETAIL_FACT_BKP';

SCOTT                       BIN$jXvLFhsgCOXgQA1M6jspCQ==$0 EMP_DETAIL_FACT_BKP
DROP                            Table Partition
scott_emp_detail                                                                                                      2009-11-09:18:41:14
2010-08-10:09:40:39                                                    9.9476E+12                                                                                                                                            NO        NO               4465594     4465594   4465594
      1088

SCOTT                       BIN$jXvLFhsgCOXgQA1M6jspCQ==$0 EMP_DETAIL_FACT_BKP
DROP                            Table Partition
scott_emp_detail                                                                                                      2009-11-09:18:41:14
2010-08-10:09:40:39                                                    9.9476E+12
                                                                                                                                               NO        NO               4465594     4465594   4465594
     12896 EMP_DETAIL_FACT_BKP
DROP                            Table Partition
scott_emp_detail                                                                                                      2009-11-09:18:41:14
2010-08-10:09:40:39                                                    9.9476E+12                                                                                                                                            NO        NO               4465594     4465594   4465594
      1088

SCOTT                       BIN$jXvLFhsgCOXgQA1M6jspCQ==03:16:47 SQL> select * from dba_recyclebin where original_name='EMP_DETAIL_FACT_BKP';

SCOTT                       BIN$jXvLFhsgCOXgQA1M6jspCQ==$0 EMP_DETAIL_FACT_BKP
DROP                            Table Partition
scott_emp_detail                                                                                                      2009-11-09:18:41:14
2010-08-10:09:40:39                                                    9.9476E+12                                                                                                                                            NO        NO               4465594     4465594   4465594
      1088

SCOTT                       BIN$jXvLFhsgCOXgQA1M6jspCQ==$0 EMP_DETAIL_FACT_BKP
DROP                            Table Partition
scott_emp_detail                                                                                                      2009-11-09:18:41:14
2010-08-10:09:40:39                                                    9.9476E+12
                                                                                                                                               NO        NO               4465594     4465594   4465594
     12896 EMP_DETAIL_FACT_BKP
DROP                            Table Partition
scott_emp_detail                                                                                                      2009-11-09:18:41:14
2010-08-10:09:40:39                                                    9.9476E+12
                                                                                                                                               NO        NO               4465594     4465594   4465594
     12896 EMP_DETAIL_FACT_BKP
DROP                            Table Partition
scott_emp_detail                                                                                                      2009-11-09:18:41:14
2010-08-10:09:40:39                                                    9.9476E+12                                                                                                                                            NO        NO               4465594     4465594   4465594
      1088

SCOTT                       BIN$jXvLFhsgCOXgQA1M6jspCQ==03:16:47 SQL> select * from dba_recyclebin where original_name='EMP_DETAIL_FACT_BKP';

SCOTT                       BIN$jXvLFhsgCOXgQA1M6jspCQ==03:16:47 SQL> select * from dba_recyclebin where original_name='EMP_DETAIL_FACT_BKP';

SCOTT                       BIN$jXvLFhsgCOXgQA1M6jspCQ==$0 EMP_DETAIL_FACT_BKP
DROP                            Table Partition
scott_emp_detail                                                                                                      2009-11-09:18:41:14
2010-08-10:09:40:39                                                    9.9476E+12                                                                                                                                            NO        NO               4465594     4465594   4465594
      1088

SCOTT                       BIN$jXvLFhsgCOXgQA1M6jspCQ==$0 EMP_DETAIL_FACT_BKP
DROP                            Table Partition
scott_emp_detail                                                                                                      2009-11-09:18:41:14
2010-08-10:09:40:39                                                    9.9476E+12
                                                                                                                                               NO        NO               4465594     4465594   4465594
     12896 EMP_DETAIL_FACT_BKP
DROP                            Table Partition
scott_emp_detail                                                                                                      2009-11-09:18:41:14
2010-08-10:09:40:39                                                    9.9476E+12                                                                                                                                            NO        NO               4465594     4465594   4465594
      1088

SCOTT                       BIN$jXvLFhsgCOXgQA1M6jspCQ==03:16:47 SQL> select * from dba_recyclebin where original_name='EMP_DETAIL_FACT_BKP';

SCOTT                       BIN$jXvLFhsgCOXgQA1M6jspCQ==$0 EMP_DETAIL_FACT_BKP
DROP                            Table Partition
scott_emp_detail                                                                                                      2009-11-09:18:41:14
2010-08-10:09:40:39                                                    9.9476E+12                                                                                                                                            NO        NO               4465594     4465594   4465594
      1088

SCOTT                       BIN$jXvLFhsgCOXgQA1M6jspCQ==$0 EMP_DETAIL_FACT_BKP
DROP                            Table Partition
scott_emp_detail                                                                                                      2009-11-09:18:41:14
2010-08-10:09:40:39                                                    9.9476E+12
                                                                                                                                               NO        NO               4465594     4465594   4465594
     12896 EMP_DETAIL_FACT_BKP
DROP                            Table Partition
scott_emp_detail                                                                                                      2009-11-09:18:41:14
2010-08-10:09:40:39                                                    9.9476E+12
                                                                                                                                               NO        NO               4465594     4465594   4465594
     12896 EMP_DETAIL_FACT_BKP
DROP                            Table Partition
scott_emp_detail                                                                                                      2009-11-09:18:41:14
2010-08-10:09:40:39                                                    9.9476E+12
                                                                                                                                               NO        NO               4465594     4465594   4465594
     12896 EMP_DETAIL_FACT_BKP
DROP                            Table Partition
scott_emp_detail                                                                                                      2009-11-09:18:41:14
2010-08-10:09:40:39                                                    9.9476E+12
                                                                                                                                               NO        NO               4465594     4465594   4465594
     12896

So we decided to purge the recyclebin and resized datafile to ensure that we get rid of corrupt block. As a result maximum block for datafile was 117765, whereas we had error for block 188654.

03:19:07 SQL> conn scott

Enter password: 

Connected.

03:19:20 SQL> purge recyclebin;

03:23:39 SQL> alter database datafile '+DG_IDX/james/datafile/scott_emp_detail.381.705692005' resize 3800m;

Database altered.

  1* select max(block_id) from dba_extents where file_id=699

03:24:25 SQL> /

MAX(BLOCK_ID)

-------------

       117765

We also ran rman backup validate and dbverify to ensure that there was no corruption

RMAN> backup validate datafile 699;

Starting backup at 11-AUG-10
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=2171 instance=james4 devtype=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: sid=2157 instance=james4 devtype=DISK
channel ORA_DISK_1: starting full datafile backupset
channel ORA_DISK_1: specifying datafile(s) in backupset
input datafile fno=00699 name=+DG_IDX/james/datafile/scott_emp_detail.381.705692005
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:35
Finished backup at 11-AUG-10

Since this is ASM, dbv needs you to specify system password

dbv file=+DG_IDX/james/datafile/scott_emp_detail.381.705692005 userid=system/oracle123 blocksize=32768

DBVERIFY - Verification starting : FILE = +DG_IDX/james/datafile/scott_emp_detail.381.705692005
DBVERIFY - Verification complete

Total Pages Examined         : 118400
Total Pages Processed (Data) : 110979
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 0
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 7421
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 0
Total Pages Marked Corrupt   : 0
Total Pages Influx           : 0
Highest block SCN            : 0 (0.0)

All this exercise made us re-look at rman backup. We realized that we need to reduce the size of RMAN backup piece as having smaller piece will help us in faster recovery of data blocks or datafile loss.
We had two options

1) Specify value of maxpiecesize so as to limit size of backup piece.
2) Specify filesperset parameter to limit the number of files in backup piece.

Both things meant we had more backup pieces but definitely it would result in improved recovery times. We modified the backup script to include filesperset parameter

BACKUP AS COMPRESSED BACKUPSET INCREMENTAL LEVEL 1 DATABASE FILESPERSET=1

In our case we could reload the data so we dropped the old table. But in case you can’t do it and you have rman backups failing because of corrupt block issue, you can use following syntax in rman run clause to skip ‘n’ corrupt blocks

set maxcorrupt for datafile 699 to 1;

Recovering from ORA-1578 ORA-8103 (Logical Corruption)

1. For ORA-1578:

There are two ways in which we can extract the data from a corrupted table:

a) Using DBMS_REPAIR.SKIP_CORRUPT_BLOCKS
b) Using Event 10231

a) Connect as sysdba user:

execute DBMS_REPAIR.SKIP_CORRUPT_BLOCKS(‘<schema_name>’,'<table_name>’);

Put the schema name and the table name of the corrupted table.

Then issue the following command to craete a new table from corrupted table:

create table new_table as select * from corrupted_table_name;

Now Drop the corrupted table, rename the newly created table to its original name and
clear the attribute for the new table as:

execute DBMS_REPAIR.SKIP_CORRUPT_BLOCKS

(‘<schema_name>’,'<table_name>’,flags=>dbms_repair.noskip_flag);

b) Set the event 10231 at session level to skip corrupted rows:

ALTER SESSION SET EVENTS ‘10231 TRACE NAME CONTEXT FOREVER, LEVEL 10’;

Then issue the following command to craete a new table from corrupted table:

create table new_table as select * from corrupted_table_name;

Now Drop the corrupted table, rename the newly created table to its original name.

2. For ORA-8103

This is a case of LOGICAL CORRUPTION and cannot be taken care by skipping the corrupted rows.

Follow Metalink Note 422547.1 to salvage data in case of ORA-8103

Physical Corruption: ORA-1578 PART-2

Recovering from physical corruption:

1. Using BMR when RMAN backups are available
When small set of blocks
are corrupted, Block Media Recovery (BMR) can be used as a recovery technique. The advantage of BMR is that it can be performed without taking the datafiles offline but the block undergoing BMR is not accessible to users. RMAN BLOCKRECOVER command is used for BMR.


1)How to identify and recover the corrupted blocks needing media recovery:
To find the corrupted blocks we can use v$database_block_corruption OR v$backup_corruption view
We can use RMAN to populate the he v$backup_corruption view, which will tell us the corruptedblocks are the datafile number on which they resides:

RMAN > backup validate database;

This command will check the datafiles for physical and logical corruption. If the backup validation discovers corrupt blocks,then RMAN updates the V$DATABASE_BLOCK_CORRUPTION and v$backup_corruption views.
Now Run :

SQL> select * from V$backup_corruption;

SQL> select * from V$database_block_corruption;

to find out the corrupted blocks.
It will give you the output like :

FILE# BLOCK BLOCKS CORRUPTION_CHANGE# MAR

15 215 1 0 YES

Now we have the list of corrupted blocks file#=15 and block#=215
We will use the following command to recover corrupted blocks:

RMAN > BLOCKRECOVER DATAFILE 15 BLOCK 215;
RMAN > BLOCKRECOVER CORRUPTION LIST;


The corrupted block will be recovered from last RMAN backup and RMAN will take care of applying archives to make the block current.


2. Using BMR when RMAN backups are not taken ?
If we have a OS level backup of corrupted datafile, it can be cataloged with RMAN to perform BMR.
Find the location of Backup of datafiles which have corruption in it.
Let us suppose the backup of corrupted datafile is at /u01/app/oracle/backup/data01.dbf

Now use the following command to catalog the backup as:
RMAN> catalog datafilecopy ‘/u01/app/oracle/backup/data01.dbf’;


Find the archive log:


RMAN will use the archive logs which are present in V$ARCHIVED_LOG, If the archivelogs are moved to some other location or they are aged out of controlfilethen they need to be cataloged as well:


RMAN> catalog archivelog ‘/u02/app/oracle/backup/archoracle_1_40.dbf’;

Once the datafiles and the required archive logs are cataloged, we can use the BMR as:


RMAN > BLOCKRECOVER datafile 15 block 215;

Restrictions on BLOCKRECOVER

1. The database needs to be in mounted or open stage.

2. Only complete media recovery of block is possible.

3. Only full backups of datafiles containing corrupted blocks can be used to perform block media recovery.

4. If the archivelogs are missing or not accessible, BMR will fail.

5. Header blocks cannot be recovered.

6. Database needs to be ARCHIVELOG mode for performing a BMR.


I will be discussing more on Recovery and data salvaging techniques under Corruption Part-3.

Physical Corruption: ORA-1578 -Part 1

ORA-1578 is indication of physical block corruption. Oracle computes an internal checksum before fetching the block from disk, when it is about to fetch the block it reads the block header and compare the checksum stored in it. If the checksum mismatches we get ora-1578 error.
It indicates that a block is corrupted at
hardware level i.e at OS.

This error will always come whenever the corrupted block is read by Oracle.

Reasons for ora-1578:

1. Bad sectors in the Storage Disk Drive.
2. Formatting of blocks At OS level.( Zeroed out disk blocks).
3. some corruption in underlying hardware which is making block unreadable.

How to collect information for ora-1578:

Error Text:

ORA-01578: ORACLE data block corrupted (file # 14, block # 38976)

The block # 38976 shows the corrupted block number and the file # 14 indicates
the reletive file number 14.

1. Get the absolute file number:

To get the absolute file # we will use the following query:

SQL > select file# from v$datafile where rfile#=14;

FILE#
——-
14

here the absolute file# is also 14.

2. Get the segment name and segment type in which this corrupted block resides:

Use the following query:

SQL > SELECT tablespace_name, segment_type, owner, segment_name FROM
dba_extents WHERE file_id = 14 and 38976 between block_id AND block_id + blocks – 1;

— It will give us the segment type (Table, Index, LOB etc.), name of the segment (Table name
Index name etc.), Owner of the segment and the tablespace name of the segment in which corrupted block resides.

SQL > select FILE#,TS#,STATUS,BLOCKS,NAME from v$datafile where FILE#=14;

— From this we will get the tablespace number and the fine name of the datafile in which the corrupted block resides.

SQL> select TS#,NAME from v$tablespace where TS#=;

— It will give us the Name of the tablespace.

Now we have all the information required to perform an action to overcome ora-1578.

I will be providing more on this about solving ora-1578 soon.