Recently we encountered a block corruption issue in 10g database on Linux x86 64 bit and using ASM for storing database files.
Saurabh had earlier written articles on ora-1578.
http://askdba.org/weblog/2008/04/physical-corruption-ora-1578-part-1/
http://askdba.org/weblog/2008/05/physical-corruption-ora-1578-part-2/
It was first time I was working on a corruption issue, so I followed above notes and found them quite useful. Good thing about ora-1578 error message is that it gives the problematic block and datafile number
ORA-01578: ORACLE data block corrupted (file # 699, block # 188654) ORA-01110: data file 699: '+DG_IDX/james/datafile/scott_emp_detail.381.705692005'
Querying dba_extents revealed that it was a Fact table and one partition was giving error.Since we had rman backup for this database we directly went for recovery using blockrecover command
Recovery Manager: Release 10.2.0.4.0 - Production on Sat Aug 7 01:48:51 2010 Copyright (c) 1982, 2007, Oracle. All rights reserved. connected to target database: james (DBID=1812247572) RMAN> run {blockrecover datafile 699 block 188654;} Starting blockrecover at 07-AUG-10 using target database control file instead of recovery catalog allocated channel: ORA_DISK_1 channel ORA_DISK_1: sid=2231 instance=james4 devtype=DISK allocated channel: ORA_DISK_2 channel ORA_DISK_2: sid=2190 instance=james4 devtype=DISK allocated channel: ORA_DISK_3 channel ORA_DISK_3: sid=2142 instance=james4 devtype=DISK allocated channel: ORA_DISK_4 channel ORA_DISK_4: sid=2186 instance=james4 devtype=DISK allocated channel: ORA_DISK_5 channel ORA_DISK_5: sid=2170 instance=james4 devtype=DISK allocated channel: ORA_DISK_6 channel ORA_DISK_6: sid=2203 instance=james4 devtype=DISK allocated channel: ORA_DISK_7 channel ORA_DISK_7: sid=2240 instance=james4 devtype=DISK allocated channel: ORA_DISK_8 channel ORA_DISK_8: sid=2195 instance=james4 devtype=DISK channel ORA_DISK_2: restoring block(s) channel ORA_DISK_2: specifying block(s) to restore from backup set restoring blocks of datafile 00699 channel ORA_DISK_2: reading from backup piece /RMNGD_RMAN1/backup/james/29lk4ift_1_1 user interrupt received RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03002: failure of blockrecover command at 08/07/2010 02:52:27 RMAN-03099: job cancelled at user request
We had to cancel the block recovery as even after 1 hour it could not locate the block from backup piece. Checking backup piece found that this backup piece was of 184G
ls -lh /RMNGD_RMAN1/backup/james/29lk4ift_1_1 -rw-r----- 1 oracle dba 184G Jul 31 18:17 /RMNGD_RMAN1/backup/james/29lk4ift_1_1
We had stopped the data load for this table but delaying further was not possible as it was impacting application. Finally we decided to use dbms_repair table to allow DML to mark the block as corrupt .
This table was FACT table which could have been loaded again (lucky for us 🙂 ). Before marking it corrupt, we took a dump of block
ALTER SYSTEM DUMP DATAFILE 699 block 188654; BEGIN DBMS_REPAIR.ADMIN_TABLES ( TABLE_NAME => 'REPAIR_TABLE', TABLE_TYPE => dbms_repair.repair_table, ACTION => dbms_repair.create_action, TABLESPACE => 'USERS'); END; / SET SERVEROUTPUT ON DECLARE num_corrupt INT; BEGIN num_corrupt := 0; DBMS_REPAIR.CHECK_OBJECT ( SCHEMA_NAME => 'SCOTT', OBJECT_NAME => 'EMP_DETAIL_FACT', PARTITION_NAME => 'SYS_P1140013', REPAIR_TABLE_NAME => 'REPAIR_TABLE', CORRUPT_COUNT => num_corrupt); DBMS_OUTPUT.PUT_LINE('number corrupt: ' || TO_CHAR (num_corrupt)); END; / SELECT OBJECT_NAME, BLOCK_ID, CORRUPT_TYPE, MARKED_CORRUPT,CORRUPT_DESCRIPTION, REPAIR_DESCRIPTION FROM REPAIR_TABLE; SET SERVEROUTPUT ON DECLARE num_fix INT; BEGIN num_fix := 0; DBMS_REPAIR.FIX_CORRUPT_BLOCKS ( SCHEMA_NAME => 'SCOTT', OBJECT_NAME => 'EMP_DETAIL_FACT', PARTITION_NAME => 'SYS_P1140013', OBJECT_TYPE => dbms_repair.table_object, REPAIR_TABLE_NAME => 'REPAIR_TABLE', FIX_COUNT=> num_fix); DBMS_OUTPUT.PUT_LINE('num fix: ' || TO_CHAR(num_fix)); END; / execute DBMS_REPAIR.SKIP_CORRUPT_BLOCKS('SCOTT','EMP_DETAIL_FACT')
Refer 10g doc to get details on DBMS_REPAIR.
This allowed the application to load data again. Later we worked with application team and created a new segment and renamed the old one to *_bkp and then dropped it.
Next day we got a alert informing errors with our rman backup
ORA-19566: exceeded limit of 0 corrupt blocks for file +DG_IDX/james/datafile/scott_emp_detail.381.705692005 continuing other job steps, job failed will not be re-run channel ORA_DISK_8: finished piece 1 at AUG-10-2010 01:56:45 piece handle=/RMNGD_RMAN3/backup/james/i2lkuv0q_1_1 tag=james_INC_20100810 comment=NONE channel ORA_DISK_8: backup set complete, elapsed time: 00:26:43 channel ORA_DISK_1: finished piece 1 at AUG-10-2010 02:13:01 piece handle=/RMNGD_RMAN1/backup/james/hrlkuv0i_1_1 tag=james_INC_20100810 comment=NONE channel ORA_DISK_1: backup set complete, elapsed time: 00:43:07 channel ORA_DISK_6: finished piece 1 at AUG-10-2010 02:21:26 piece handle=/RMNGD_RMAN2/backup/james/i0lkuv0n_1_1 tag=james_INC_20100810 comment=NONE channel ORA_DISK_6: backup set complete, elapsed time: 00:51:27 RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: ===========================================================
Backup job failed again as the table was still lying in recyclebin.
03:16:47 SQL> select * from dba_recyclebin where original_name='EMP_DETAIL_FACT_BKP'; SCOTT BIN$jXvLFhsgCOXgQA1M6jspCQ==03:16:47 SQL> select * from dba_recyclebin where original_name=\'EMP_DETAIL_FACT_BKP\'; SCOTT BIN$jXvLFhsgCOXgQA1M6jspCQ==03:16:47 SQL> select * from dba_recyclebin where original_name='EMP_DETAIL_FACT_BKP'; SCOTT BIN$jXvLFhsgCOXgQA1M6jspCQ==03:16:47 SQL> select * from dba_recyclebin where original_name='EMP_DETAIL_FACT_BKP'; SCOTT BIN$jXvLFhsgCOXgQA1M6jspCQ==$0 EMP_DETAIL_FACT_BKP DROP Table Partition scott_emp_detail 2009-11-09:18:41:14 2010-08-10:09:40:39 9.9476E+12 NO NO 4465594 4465594 4465594 1088 SCOTT BIN$jXvLFhsgCOXgQA1M6jspCQ==$0 EMP_DETAIL_FACT_BKP DROP Table Partition scott_emp_detail 2009-11-09:18:41:14 2010-08-10:09:40:39 9.9476E+12 NO NO 4465594 4465594 4465594 12896 EMP_DETAIL_FACT_BKP DROP Table Partition scott_emp_detail 2009-11-09:18:41:14 2010-08-10:09:40:39 9.9476E+12 NO NO 4465594 4465594 4465594 1088 SCOTT BIN$jXvLFhsgCOXgQA1M6jspCQ==03:16:47 SQL> select * from dba_recyclebin where original_name='EMP_DETAIL_FACT_BKP'; SCOTT BIN$jXvLFhsgCOXgQA1M6jspCQ==$0 EMP_DETAIL_FACT_BKP DROP Table Partition scott_emp_detail 2009-11-09:18:41:14 2010-08-10:09:40:39 9.9476E+12 NO NO 4465594 4465594 4465594 1088 SCOTT BIN$jXvLFhsgCOXgQA1M6jspCQ==$0 EMP_DETAIL_FACT_BKP DROP Table Partition scott_emp_detail 2009-11-09:18:41:14 2010-08-10:09:40:39 9.9476E+12 NO NO 4465594 4465594 4465594 12896 EMP_DETAIL_FACT_BKP DROP Table Partition scott_emp_detail 2009-11-09:18:41:14 2010-08-10:09:40:39 9.9476E+12 NO NO 4465594 4465594 4465594 12896 EMP_DETAIL_FACT_BKP DROP Table Partition scott_emp_detail 2009-11-09:18:41:14 2010-08-10:09:40:39 9.9476E+12 NO NO 4465594 4465594 4465594 1088 SCOTT BIN$jXvLFhsgCOXgQA1M6jspCQ==03:16:47 SQL> select * from dba_recyclebin where original_name='EMP_DETAIL_FACT_BKP'; SCOTT BIN$jXvLFhsgCOXgQA1M6jspCQ==03:16:47 SQL> select * from dba_recyclebin where original_name='EMP_DETAIL_FACT_BKP'; SCOTT BIN$jXvLFhsgCOXgQA1M6jspCQ==$0 EMP_DETAIL_FACT_BKP DROP Table Partition scott_emp_detail 2009-11-09:18:41:14 2010-08-10:09:40:39 9.9476E+12 NO NO 4465594 4465594 4465594 1088 SCOTT BIN$jXvLFhsgCOXgQA1M6jspCQ==$0 EMP_DETAIL_FACT_BKP DROP Table Partition scott_emp_detail 2009-11-09:18:41:14 2010-08-10:09:40:39 9.9476E+12 NO NO 4465594 4465594 4465594 12896 EMP_DETAIL_FACT_BKP DROP Table Partition scott_emp_detail 2009-11-09:18:41:14 2010-08-10:09:40:39 9.9476E+12 NO NO 4465594 4465594 4465594 1088 SCOTT BIN$jXvLFhsgCOXgQA1M6jspCQ==03:16:47 SQL> select * from dba_recyclebin where original_name='EMP_DETAIL_FACT_BKP'; SCOTT BIN$jXvLFhsgCOXgQA1M6jspCQ==$0 EMP_DETAIL_FACT_BKP DROP Table Partition scott_emp_detail 2009-11-09:18:41:14 2010-08-10:09:40:39 9.9476E+12 NO NO 4465594 4465594 4465594 1088 SCOTT BIN$jXvLFhsgCOXgQA1M6jspCQ==$0 EMP_DETAIL_FACT_BKP DROP Table Partition scott_emp_detail 2009-11-09:18:41:14 2010-08-10:09:40:39 9.9476E+12 NO NO 4465594 4465594 4465594 12896 EMP_DETAIL_FACT_BKP DROP Table Partition scott_emp_detail 2009-11-09:18:41:14 2010-08-10:09:40:39 9.9476E+12 NO NO 4465594 4465594 4465594 12896 EMP_DETAIL_FACT_BKP DROP Table Partition scott_emp_detail 2009-11-09:18:41:14 2010-08-10:09:40:39 9.9476E+12 NO NO 4465594 4465594 4465594 12896 EMP_DETAIL_FACT_BKP DROP Table Partition scott_emp_detail 2009-11-09:18:41:14 2010-08-10:09:40:39 9.9476E+12 NO NO 4465594 4465594 4465594 1088 SCOTT BIN$jXvLFhsgCOXgQA1M6jspCQ==03:16:47 SQL> select * from dba_recyclebin where original_name=\'EMP_DETAIL_FACT_BKP\'; SCOTT BIN$jXvLFhsgCOXgQA1M6jspCQ==03:16:47 SQL> select * from dba_recyclebin where original_name='EMP_DETAIL_FACT_BKP'; SCOTT BIN$jXvLFhsgCOXgQA1M6jspCQ==03:16:47 SQL> select * from dba_recyclebin where original_name='EMP_DETAIL_FACT_BKP'; SCOTT BIN$jXvLFhsgCOXgQA1M6jspCQ==$0 EMP_DETAIL_FACT_BKP DROP Table Partition scott_emp_detail 2009-11-09:18:41:14 2010-08-10:09:40:39 9.9476E+12 NO NO 4465594 4465594 4465594 1088 SCOTT BIN$jXvLFhsgCOXgQA1M6jspCQ==$0 EMP_DETAIL_FACT_BKP DROP Table Partition scott_emp_detail 2009-11-09:18:41:14 2010-08-10:09:40:39 9.9476E+12 NO NO 4465594 4465594 4465594 12896 EMP_DETAIL_FACT_BKP DROP Table Partition scott_emp_detail 2009-11-09:18:41:14 2010-08-10:09:40:39 9.9476E+12 NO NO 4465594 4465594 4465594 1088 SCOTT BIN$jXvLFhsgCOXgQA1M6jspCQ==03:16:47 SQL> select * from dba_recyclebin where original_name='EMP_DETAIL_FACT_BKP'; SCOTT BIN$jXvLFhsgCOXgQA1M6jspCQ==$0 EMP_DETAIL_FACT_BKP DROP Table Partition scott_emp_detail 2009-11-09:18:41:14 2010-08-10:09:40:39 9.9476E+12 NO NO 4465594 4465594 4465594 1088 SCOTT BIN$jXvLFhsgCOXgQA1M6jspCQ==$0 EMP_DETAIL_FACT_BKP DROP Table Partition scott_emp_detail 2009-11-09:18:41:14 2010-08-10:09:40:39 9.9476E+12 NO NO 4465594 4465594 4465594 12896 EMP_DETAIL_FACT_BKP DROP Table Partition scott_emp_detail 2009-11-09:18:41:14 2010-08-10:09:40:39 9.9476E+12 NO NO 4465594 4465594 4465594 12896 EMP_DETAIL_FACT_BKP DROP Table Partition scott_emp_detail 2009-11-09:18:41:14 2010-08-10:09:40:39 9.9476E+12 NO NO 4465594 4465594 4465594 1088 SCOTT BIN$jXvLFhsgCOXgQA1M6jspCQ==03:16:47 SQL> select * from dba_recyclebin where original_name='EMP_DETAIL_FACT_BKP'; SCOTT BIN$jXvLFhsgCOXgQA1M6jspCQ==03:16:47 SQL> select * from dba_recyclebin where original_name='EMP_DETAIL_FACT_BKP'; SCOTT BIN$jXvLFhsgCOXgQA1M6jspCQ==$0 EMP_DETAIL_FACT_BKP DROP Table Partition scott_emp_detail 2009-11-09:18:41:14 2010-08-10:09:40:39 9.9476E+12 NO NO 4465594 4465594 4465594 1088 SCOTT BIN$jXvLFhsgCOXgQA1M6jspCQ==$0 EMP_DETAIL_FACT_BKP DROP Table Partition scott_emp_detail 2009-11-09:18:41:14 2010-08-10:09:40:39 9.9476E+12 NO NO 4465594 4465594 4465594 12896 EMP_DETAIL_FACT_BKP DROP Table Partition scott_emp_detail 2009-11-09:18:41:14 2010-08-10:09:40:39 9.9476E+12 NO NO 4465594 4465594 4465594 1088 SCOTT BIN$jXvLFhsgCOXgQA1M6jspCQ==03:16:47 SQL> select * from dba_recyclebin where original_name='EMP_DETAIL_FACT_BKP'; SCOTT BIN$jXvLFhsgCOXgQA1M6jspCQ==$0 EMP_DETAIL_FACT_BKP DROP Table Partition scott_emp_detail 2009-11-09:18:41:14 2010-08-10:09:40:39 9.9476E+12 NO NO 4465594 4465594 4465594 1088 SCOTT BIN$jXvLFhsgCOXgQA1M6jspCQ==$0 EMP_DETAIL_FACT_BKP DROP Table Partition scott_emp_detail 2009-11-09:18:41:14 2010-08-10:09:40:39 9.9476E+12 NO NO 4465594 4465594 4465594 12896 EMP_DETAIL_FACT_BKP DROP Table Partition scott_emp_detail 2009-11-09:18:41:14 2010-08-10:09:40:39 9.9476E+12 NO NO 4465594 4465594 4465594 12896 EMP_DETAIL_FACT_BKP DROP Table Partition scott_emp_detail 2009-11-09:18:41:14 2010-08-10:09:40:39 9.9476E+12 NO NO 4465594 4465594 4465594 12896 EMP_DETAIL_FACT_BKP DROP Table Partition scott_emp_detail 2009-11-09:18:41:14 2010-08-10:09:40:39 9.9476E+12 NO NO 4465594 4465594 4465594 12896
So we decided to purge the recyclebin and resized datafile to ensure that we get rid of corrupt block. As a result maximum block for datafile was 117765, whereas we had error for block 188654.
03:19:07 SQL> conn scott Enter password: Connected. 03:19:20 SQL> purge recyclebin; 03:23:39 SQL> alter database datafile '+DG_IDX/james/datafile/scott_emp_detail.381.705692005' resize 3800m; Database altered. 1* select max(block_id) from dba_extents where file_id=699 03:24:25 SQL> / MAX(BLOCK_ID) ------------- 117765
We also ran rman backup validate and dbverify to ensure that there was no corruption
RMAN> backup validate datafile 699; Starting backup at 11-AUG-10 using target database control file instead of recovery catalog allocated channel: ORA_DISK_1 channel ORA_DISK_1: sid=2171 instance=james4 devtype=DISK allocated channel: ORA_DISK_2 channel ORA_DISK_2: sid=2157 instance=james4 devtype=DISK channel ORA_DISK_1: starting full datafile backupset channel ORA_DISK_1: specifying datafile(s) in backupset input datafile fno=00699 name=+DG_IDX/james/datafile/scott_emp_detail.381.705692005 channel ORA_DISK_1: backup set complete, elapsed time: 00:00:35 Finished backup at 11-AUG-10
Since this is ASM, dbv needs you to specify system password
dbv file=+DG_IDX/james/datafile/scott_emp_detail.381.705692005 userid=system/oracle123 blocksize=32768 DBVERIFY - Verification starting : FILE = +DG_IDX/james/datafile/scott_emp_detail.381.705692005 DBVERIFY - Verification complete Total Pages Examined : 118400 Total Pages Processed (Data) : 110979 Total Pages Failing (Data) : 0 Total Pages Processed (Index): 0 Total Pages Failing (Index): 0 Total Pages Processed (Other): 7421 Total Pages Processed (Seg) : 0 Total Pages Failing (Seg) : 0 Total Pages Empty : 0 Total Pages Marked Corrupt : 0 Total Pages Influx : 0 Highest block SCN : 0 (0.0)
All this exercise made us re-look at rman backup. We realized that we need to reduce the size of RMAN backup piece as having smaller piece will help us in faster recovery of data blocks or datafile loss.
We had two options
1) Specify value of maxpiecesize so as to limit size of backup piece.
2) Specify filesperset parameter to limit the number of files in backup piece.
Both things meant we had more backup pieces but definitely it would result in improved recovery times. We modified the backup script to include filesperset parameter
BACKUP AS COMPRESSED BACKUPSET INCREMENTAL LEVEL 1 DATABASE FILESPERSET=1
In our case we could reload the data so we dropped the old table. But in case you can’t do it and you have rman backups failing because of corrupt block issue, you can use following syntax in rman run clause to skip ‘n’ corrupt blocks
set maxcorrupt for datafile 699 to 1;
Recent Comments