坏块 oracle

Posted by wukaiqiang; tagged with none

Oracle - dg坏块修复(二)
一、概述
本文是坏块修复(一)的续篇,这篇文章将介绍如何在dg环境中模拟坏块,以及出现坏块该如何修复。实验分为以下几个步骤。

  1. 主库表出现坏块
  2. dg库表出现坏块

二、环境准备
本实验是在oracle 11G,主库 + ADG环境下进行

  1. 准备相关表
    create tablespace tbs01 datafile '/u01/app/oracle/oradata/orcltest/tbs01.dbf' size 100m;
    create table scott.t01 tablespace tbs01 as select * from dba_objects where rownum<=100;

select object_id, rowid, dbms_rowid.rowid_relative_fno(rowid) file_id, dbms_rowid.rowid_block_number(rowid) block_id from scott.t01;

复制代码
OBJECT_ID ROWID FILE_ID BLOCK_ID


    20 AAAVpfAAGAAAACDAAA          6        131
    46 AAAVpfAAGAAAACDAAB          6        131
    28 AAAVpfAAGAAAACDAAC          6        131
    15 AAAVpfAAGAAAACDAAD          6        131
    ...
    99 AAAVphAAGAAAACEAAJ          6        132
   100 AAAVphAAGAAAACEAAK          6        132
   101 AAAVphAAGAAAACEAAL          6        132

复制代码

  1. 全库备份
    RMAN> backup database; // 全库备份

RMAN> list backup; // 查看备份

复制代码

List of Backup Sets

BS Key Type LV Size Device Type Elapsed Time Completion Time


19 Full 1.08G DISK 00:01:59 12-MAR-20

    BP Key: 19   Status: AVAILABLE  Compressed: NO  Tag: TAG20200312T150629
    Piece Name: /home/oracle/backupdir/ORCLTEST_2750922031_40_1_20200312_1034867190.bkp

List of Datafiles in backup set 19
File LV Type Ckp SCN Ckp Time Name
---- -- ---- ---------- --------- ----
1 Full 1148218 12-MAR-20 /u01/app/oracle/oradata/orcltest/system01.dbf
2 Full 1148218 12-MAR-20 /u01/app/oracle/oradata/orcltest/sysaux01.dbf
3 Full 1148218 12-MAR-20 /u01/app/oracle/oradata/orcltest/undotbs01.dbf
4 Full 1148218 12-MAR-20 /u01/app/oracle/oradata/orcltest/users01.dbf
5 Full 1148218 12-MAR-20 /u01/app/oracle/oradata/orcltest/example01.dbf
6 Full 1148218 12-MAR-20 /u01/app/oracle/oradata/orcltest/tbs01.dbf
复制代码

三、主库表出现坏块

  1. 模拟坏块
    RMAN> blockrecover datafile 6 block 131 clear; // 将131数据块清空,即相当于产生了坏块

SQL> select * from scott.t01; // 对表进行查询,正常查询
看过我上一篇文章的就会知道,照道理这里应该会报错,但是实际并没有

查看alert日志

  1. 检测坏块
    RMAN> backup check logical validate datafile 6;

复制代码

List of Datafiles

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN


6 OK 0 11187 12800 1255050
File Name: /u01/app/oracle/oradata/orcltest/tbs01.dbf
Block Type Blocks Failing Blocks Processed
---------- -------------- ----------------
Data 0 1248
Index 0 195
Other 0 170
复制代码
并没有发现任何坏块,结合前面alert日志,可以看到主库的坏块已经被自动修复了,这个其实是ADG的功能(自动修复主库的坏块)。

四、dg库表出现坏块

  1. 模拟坏块
    RMAN> blockrecover datafile 6 block 131 clear; // 将131数据块清空,即相当于产生了坏块

RMAN> backup check logical validate datafile 6; // 使用rman检测坏块

复制代码

List of Datafiles

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN


6 FAILED 0 11187 12800 1574361
File Name: /u01/app/oracle/oradata/orcltestdg/tbs01.dbf
Block Type Blocks Failing Blocks Processed
---------- -------------- ----------------
Data 1 1249
Index 0 194
Other 0 170

validate found one or more corrupt blocks
复制代码

SQL> select * from scott.t01; // 对表进行查询,直至报错,时间可能稍微有点久

select * from scott.t01

                *

ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 6, block # 131)
ORA-01110: data file 6: '/u01/app/oracle/oradata/orcltestdg/tbs01.dbf'

  1. 修复坏块,使用备份文件修复
    主库
    scp ORCLTEST_2750922031_40_1_20200312_1034867190.bkp 10.40.16.121:~ // 主库拷贝备份到dg库中

dg库
RMAN> catalog start with '/home/oracle/'; // 注册备份到dg库中
SQL> alter database recover managed standby database cancel; // 先关闭dg的同步
SQL> shutdown immediate // 停库
rm -rf /u01/app/oracle/oradata/orcltestdg/tbs01.dbf // 删除有坏块的数据文件
SQL> startup mount // 启库到mount状态
RMAN> restore datafile 6; // 还原数据文件
SQL> alter database recover managed standby database disconnect from session; // 打开mrp进程恢复数据库
SQL> alter database recover managed standby database cancel; // 恢复数据库一段时间后关闭mrp进程
SQL> alter database open; // 启库到open状态
SQL> alter database recover managed standby database using current logfile disconnect from session; // 打开mrp进程

SQL> select * from scott.t01; // 查询表正常

RMAN> backup check logical validate datafile 6; // 使用rman检测坏块,已经没了坏块

复制代码

List of Datafiles

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN


6 OK 0 1 12801 1574361
File Name: /u01/app/oracle/oradata/orcltestdg/tbs01.dbf
Block Type Blocks Failing Blocks Processed
---------- -------------- ----------------
Data 0 1249
Index 0 194
Other 0 11356
复制代码

  1. 补充说明
    最开始我想利用blockrecover的方式去修复坏块,结果不行,报错信息如下:
    RMAN> blockrecover corruption list;

复制代码
Starting recover at 16-MAR-20
using channel ORA_DISK_1

channel ORA_DISK_1: restoring block(s)
channel ORA_DISK_1: specifying block(s) to restore from backup set
restoring blocks of datafile 00006
channel ORA_DISK_1: reading from backup piece /home/oracle/ORCLTEST_2750922031_40_1_20200312_1034867190.bkp
channel ORA_DISK_1: piece handle=/home/oracle/ORCLTEST_2750922031_40_1_20200312_1034867190.bkp tag=TAG20200312T150629
channel ORA_DISK_1: restored block(s) from backup piece 1
channel ORA_DISK_1: block restore complete, elapsed time: 00:00:01

starting media recovery
media recovery failed
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 03/16/2020 14:23:00
ORA-00283: recovery session canceled due to errors
ORA-01122: database file 6 failed verification check
ORA-01110: data file 6: '/u01/app/oracle/oradata/orcltestdg/tbs01.dbf'
ORA-01207: file is more recent than control file - old control file
复制代码

五、总结

  1. 主库的坏块,可以通过adg自动修复,不需要人工干预。
  2. dg的坏块需要通过修复数据文件的方式去处理,不能使用blockrecover,如果大家有更好的方法,欢迎留言。
  3. 逻辑坏块和物理坏块:
    a. 逻辑坏块一般是oracle系统bug造成,指的是块内的数据逻辑存在问题,比如说索引块的索引值没有按从小到大排列(非官方解释)。逻辑坏块一般会伴随着ora-600和ora-1578。检测逻辑坏块使用RMAN> backup check logical validate...
    b. 物理坏块一般是由于底层os/disk系统错误/损坏,导致数据块被修改。常见的物理坏块有块头和块尾信息不一致,checksum值无效,数据块信息全部为0等情况,并且可能伴随错误ORA-1578和ORA-1110
  4. 数据库坏块的相关参数 db_block_checksum:默认是typical,不需要更改。该参数控制写入数据文件的时候是否将checksum值写入块头,当第二次读取该块时,重新计算checksum值,并与块头的checksum值进行比较,如果两次checksum值不一样,则标记该块为坏块。用于快速发现物理坏块。
    db_block_checking:默认是fasle,可以将其改为true。该参数控制块发生任何变化的时候进行逻辑上的完整性和正确性检查。用于阻止内存逻辑坏块和数据逻辑坏块。但会增加1%-10%的性能消耗。
  5. 坏块优秀文章分享
    物理坏块 https://blogs.oracle.com/database4cn/oraclecorruption-,该文章介绍了如果没有备份如何通过dbms_repair这个包去将坏块跳过,并把表备份出来。

rman>blockrecover datafile 3 block 2,150,152;

RMAN> blockrecover corruption list;

Starting recover at 14-MAY-23
using channel ORA_DISK_1

starting media recovery
media recovery complete, elapsed time: 00:00:00

Finished recover at 14-MAY-23

RMAN>