分类 数据库 下的文章

Oracle - dg坏块修复(二)
一、概述
本文是坏块修复(一)的续篇,这篇文章将介绍如何在dg环境中模拟坏块,以及出现坏块该如何修复。实验分为以下几个步骤。

  1. 主库表出现坏块
  2. dg库表出现坏块

二、环境准备
本实验是在oracle 11G,主库 + ADG环境下进行

  1. 准备相关表
    create tablespace tbs01 datafile '/u01/app/oracle/oradata/orcltest/tbs01.dbf' size 100m;
    create table scott.t01 tablespace tbs01 as select * from dba_objects where rownum<=100;

select object_id, rowid, dbms_rowid.rowid_relative_fno(rowid) file_id, dbms_rowid.rowid_block_number(rowid) block_id from scott.t01;

复制代码
OBJECT_ID ROWID FILE_ID BLOCK_ID


    20 AAAVpfAAGAAAACDAAA          6        131
    46 AAAVpfAAGAAAACDAAB          6        131
    28 AAAVpfAAGAAAACDAAC          6        131
    15 AAAVpfAAGAAAACDAAD          6        131
    ...
    99 AAAVphAAGAAAACEAAJ          6        132
   100 AAAVphAAGAAAACEAAK          6        132
   101 AAAVphAAGAAAACEAAL          6        132

复制代码

  1. 全库备份
    RMAN> backup database; // 全库备份

RMAN> list backup; // 查看备份

复制代码

List of Backup Sets

BS Key Type LV Size Device Type Elapsed Time Completion Time


19 Full 1.08G DISK 00:01:59 12-MAR-20

    BP Key: 19   Status: AVAILABLE  Compressed: NO  Tag: TAG20200312T150629
    Piece Name: /home/oracle/backupdir/ORCLTEST_2750922031_40_1_20200312_1034867190.bkp

List of Datafiles in backup set 19
File LV Type Ckp SCN Ckp Time Name
---- -- ---- ---------- --------- ----
1 Full 1148218 12-MAR-20 /u01/app/oracle/oradata/orcltest/system01.dbf
2 Full 1148218 12-MAR-20 /u01/app/oracle/oradata/orcltest/sysaux01.dbf
3 Full 1148218 12-MAR-20 /u01/app/oracle/oradata/orcltest/undotbs01.dbf
4 Full 1148218 12-MAR-20 /u01/app/oracle/oradata/orcltest/users01.dbf
5 Full 1148218 12-MAR-20 /u01/app/oracle/oradata/orcltest/example01.dbf
6 Full 1148218 12-MAR-20 /u01/app/oracle/oradata/orcltest/tbs01.dbf
复制代码

三、主库表出现坏块

  1. 模拟坏块
    RMAN> blockrecover datafile 6 block 131 clear; // 将131数据块清空,即相当于产生了坏块

SQL> select * from scott.t01; // 对表进行查询,正常查询
看过我上一篇文章的就会知道,照道理这里应该会报错,但是实际并没有

查看alert日志

  1. 检测坏块
    RMAN> backup check logical validate datafile 6;

复制代码

List of Datafiles

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN


6 OK 0 11187 12800 1255050
File Name: /u01/app/oracle/oradata/orcltest/tbs01.dbf
Block Type Blocks Failing Blocks Processed
---------- -------------- ----------------
Data 0 1248
Index 0 195
Other 0 170
复制代码
并没有发现任何坏块,结合前面alert日志,可以看到主库的坏块已经被自动修复了,这个其实是ADG的功能(自动修复主库的坏块)。

四、dg库表出现坏块

  1. 模拟坏块
    RMAN> blockrecover datafile 6 block 131 clear; // 将131数据块清空,即相当于产生了坏块

RMAN> backup check logical validate datafile 6; // 使用rman检测坏块

复制代码

List of Datafiles

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN


6 FAILED 0 11187 12800 1574361
File Name: /u01/app/oracle/oradata/orcltestdg/tbs01.dbf
Block Type Blocks Failing Blocks Processed
---------- -------------- ----------------
Data 1 1249
Index 0 194
Other 0 170

validate found one or more corrupt blocks
复制代码

SQL> select * from scott.t01; // 对表进行查询,直至报错,时间可能稍微有点久

select * from scott.t01

                *

ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 6, block # 131)
ORA-01110: data file 6: '/u01/app/oracle/oradata/orcltestdg/tbs01.dbf'

  1. 修复坏块,使用备份文件修复
    主库
    scp ORCLTEST_2750922031_40_1_20200312_1034867190.bkp 10.40.16.121:~ // 主库拷贝备份到dg库中

dg库
RMAN> catalog start with '/home/oracle/'; // 注册备份到dg库中
SQL> alter database recover managed standby database cancel; // 先关闭dg的同步
SQL> shutdown immediate // 停库
rm -rf /u01/app/oracle/oradata/orcltestdg/tbs01.dbf // 删除有坏块的数据文件
SQL> startup mount // 启库到mount状态
RMAN> restore datafile 6; // 还原数据文件
SQL> alter database recover managed standby database disconnect from session; // 打开mrp进程恢复数据库
SQL> alter database recover managed standby database cancel; // 恢复数据库一段时间后关闭mrp进程
SQL> alter database open; // 启库到open状态
SQL> alter database recover managed standby database using current logfile disconnect from session; // 打开mrp进程

SQL> select * from scott.t01; // 查询表正常

RMAN> backup check logical validate datafile 6; // 使用rman检测坏块,已经没了坏块

复制代码

List of Datafiles

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN


6 OK 0 1 12801 1574361
File Name: /u01/app/oracle/oradata/orcltestdg/tbs01.dbf
Block Type Blocks Failing Blocks Processed
---------- -------------- ----------------
Data 0 1249
Index 0 194
Other 0 11356
复制代码

  1. 补充说明
    最开始我想利用blockrecover的方式去修复坏块,结果不行,报错信息如下:
    RMAN> blockrecover corruption list;

复制代码
Starting recover at 16-MAR-20
using channel ORA_DISK_1

channel ORA_DISK_1: restoring block(s)
channel ORA_DISK_1: specifying block(s) to restore from backup set
restoring blocks of datafile 00006
channel ORA_DISK_1: reading from backup piece /home/oracle/ORCLTEST_2750922031_40_1_20200312_1034867190.bkp
channel ORA_DISK_1: piece handle=/home/oracle/ORCLTEST_2750922031_40_1_20200312_1034867190.bkp tag=TAG20200312T150629
channel ORA_DISK_1: restored block(s) from backup piece 1
channel ORA_DISK_1: block restore complete, elapsed time: 00:00:01

starting media recovery
media recovery failed
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 03/16/2020 14:23:00
ORA-00283: recovery session canceled due to errors
ORA-01122: database file 6 failed verification check
ORA-01110: data file 6: '/u01/app/oracle/oradata/orcltestdg/tbs01.dbf'
ORA-01207: file is more recent than control file - old control file
复制代码

五、总结

  1. 主库的坏块,可以通过adg自动修复,不需要人工干预。
  2. dg的坏块需要通过修复数据文件的方式去处理,不能使用blockrecover,如果大家有更好的方法,欢迎留言。
  3. 逻辑坏块和物理坏块:
    a. 逻辑坏块一般是oracle系统bug造成,指的是块内的数据逻辑存在问题,比如说索引块的索引值没有按从小到大排列(非官方解释)。逻辑坏块一般会伴随着ora-600和ora-1578。检测逻辑坏块使用RMAN> backup check logical validate...
    b. 物理坏块一般是由于底层os/disk系统错误/损坏,导致数据块被修改。常见的物理坏块有块头和块尾信息不一致,checksum值无效,数据块信息全部为0等情况,并且可能伴随错误ORA-1578和ORA-1110
  4. 数据库坏块的相关参数 db_block_checksum:默认是typical,不需要更改。该参数控制写入数据文件的时候是否将checksum值写入块头,当第二次读取该块时,重新计算checksum值,并与块头的checksum值进行比较,如果两次checksum值不一样,则标记该块为坏块。用于快速发现物理坏块。
    db_block_checking:默认是fasle,可以将其改为true。该参数控制块发生任何变化的时候进行逻辑上的完整性和正确性检查。用于阻止内存逻辑坏块和数据逻辑坏块。但会增加1%-10%的性能消耗。
  5. 坏块优秀文章分享
    物理坏块 https://blogs.oracle.com/database4cn/oraclecorruption-,该文章介绍了如果没有备份如何通过dbms_repair这个包去将坏块跳过,并把表备份出来。

rman>blockrecover datafile 3 block 2,150,152;

RMAN> blockrecover corruption list;

Starting recover at 14-MAY-23
using channel ORA_DISK_1

starting media recovery
media recovery complete, elapsed time: 00:00:00

Finished recover at 14-MAY-23

RMAN>

(1)尝试删除

把mysql数据文件夹下的ib_logfile0和ib_logfile1这两个文件删除就可以了。

如果还不能解决问题,则

(2)修改mysql启动级别

在my.cnf中修改innodb_force_recovery=6, 然后启动mysql,正常应该能顺利启动。

然后用navicat备份所需的数据库;

停掉mysql

删掉iddata等文件,以默认的innodb_force_recovery=0启动

恢复备份的数据库

1、cd /var/lib rm -rf mysql/* systemctl start mysqld

2、初始密码在/var/log/mysqld.log这个文件里

3、输入命令:grep 'temporary password' /var/log/mysqld.log,可以直接获取密码。

(注:密码为冒号后面的所有字符!)

alter user 'root'@'localhost' identified by '密码';

use mysql;

update user set host='%' where user='root' limit 1;

flush privileges;

mysql从库开启crash-safe:

variable 1 varchar2;
exec :1:='HGV2305200017879'
variable 2 varchar2;
exec :2:='5569093'
variable 3 varchar2;
exec :3:='FQC_3D'

select * from BYDMESAMF.MES2_BYD_SFC_KEY t where t.sfc = :1 and t.name = :2 and t.value = :3

select * from BYDMESAMF.MES2_BYD_SFC_KEY t where t.sfc = 'HGV2305200017879' and t.name = '5569093' and t.value = 'FQC_3D?'

DG主备同步状态异常。alert日志报错ORA-01274,MRP进程异常中断。
问题现象
具体现象
问题分析
问题原因
解决方案:
扩展
备库standby_file_management参数为auto
备库standby_file_management参数为manual
备库standby_file_management参数为auto
结论
问题现象
DG主备同步状态异常。alert日志报错ORA-01274,MRP进程异常中断。

具体现象
SQL> col client_pid for a10
SQL> SELECT inst_id, thread#, process, pid, status, client_process, client_pid,
2 sequence#, block#, active_agents, known_agents FROM gv$managed_standby ORDER BY thread#, pid;

INST_ID THREAD# PROCESS PID STATUS CLIENT_P CLIENT_PID SEQUENCE# BLOCK# ACTIVE_AGENTS KNOWN_AGENTS


   1        0 ARCH             2702 CONNECTED    ARCH     2702               0          0             0            0
   1        0 ARCH             2706 CONNECTED    ARCH     2706               0          0             0            0
   1        0 ARCH             2708 CONNECTED    ARCH     2708               0          0             0            0
   1        0 RFS              2840 IDLE         UNKNOWN  2830               0          0             0            0
   1        0 RFS              2858 IDLE         UNKNOWN  2824               0          0             0            0
   1        0 RFS              2869 IDLE         ARCH     2828               0          0             0            0
   1        1 ARCH             2704 CLOSING      ARCH     2704              62          1             0            0
   1        1 RFS              2860 IDLE         LGWR     2832              63       1439             0            0

8 rows selected.

日志应用状态:

SQL> 2 FROM (select thread# thrd, MAX(sequence#) almax
3 FROM v$archived_log
4 WHERE resetlogs_change#=(SELECT resetlogs_change# FROM v$database) GROUP BY thread#) al,
5 (SELECT thread# thrd, MAX(sequence#) lhmax
6 FROM v$log_history
7 WHERE resetlogs_change#=(SELECT resetlogs_change# FROM v$database) GROUP BY thread#) lh
8 WHERE al.thrd = lh.thrd;

Thread Last Seq Received Last Seq Applied

     1                65               62

1 row selected.

查看同步状态

SELECT * FROM v$dataguard_stats WHERE name LIKE '%lag%';SQL>

NAME VALUE UNIT TIME_COMPUTED DATUM_TIME


transport lag +00 00:00:00 day(2) to second(0) interval 05/09/2020 16:15:43 05/09/2020 16:15:43
apply lag +00 00:05:49 day(2) to second(0) interval 05/09/2020 16:15:43 05/09/2020 16:15:43

2 rows selected.

已经出现将近6min的lag。所以现在dg的同步状态是异常的。

查看主库alert日志:

Sat May 09 16:09:54 2020
alter tablespace zhuo add datafile size 20M
Completed: alter tablespace zhuo add datafile size 20M

查看备库的alert日志:

Sat May 09 16:09:54 2020
File #6 added to control file as 'UNNAMED00006' because
the parameter STANDBY_FILE_MANAGEMENT is set to MANUAL
The file should be manually created to continue.
MRP0: Background Media Recovery terminated with error 1274
Errors in file /u01/app/oracle/diag/rdbms/zhuodg/zhuodg/trace/zhuodg_mrp0_2906.trc:
ORA-01274: cannot add datafile '/u01/app/oracle/oradata/ZHUO/datafile/o1_mf_zhuo_hcdsblpt_.dbf' - file could not be created
Managed Standby Recovery not using Real Time Apply
Recovery interrupted!
Recovered data files to a consistent state at change 789594
MRP0: Background Media Recovery process shutdown (zhuodg)

发现主库在添加数据文件成功后,备库报错ORA-01274。并且MRP0进程异常关闭。

先再备库手动重启下MRP进程:

SQL> recover managed standby database using current logfile disconnect;
Media recovery complete.

alert日志:

Sat May 09 16:11:42 2020
ALTER DATABASE RECOVER managed standby database using current logfile disconnect
Attempt to start background Managed Standby Recovery process (zhuodg)
Sat May 09 16:11:42 2020
MRP0 started with pid=28, OS id=3140
MRP0: Background Managed Standby Recovery process started (zhuodg)
Serial Media Recovery started
Managed Standby Recovery starting Real Time Apply
Sat May 09 16:11:47 2020
Errors in file /u01/app/oracle/diag/rdbms/zhuodg/zhuodg/trace/zhuodg_dbw0_2668.trc:
ORA-01186: file 6 failed verification tests
ORA-01157: cannot identify/lock data file 6 - see DBWR trace file
ORA-01111: name for data file 6 is unknown - rename to correct file
ORA-01110: data file 6: '/u01/app/oracle/product/11.2.0/dbhome_1/dbs/UNNAMED00006'
File 6 not verified due to error ORA-01157
MRP0: Background Media Recovery terminated with error 1111
Errors in file /u01/app/oracle/diag/rdbms/zhuodg/zhuodg/trace/zhuodg_mrp0_3140.trc:
ORA-01111: name for data file 6 is unknown - rename to correct file
ORA-01110: data file 6: '/u01/app/oracle/product/11.2.0/dbhome_1/dbs/UNNAMED00006'
ORA-01157: cannot identify/lock data file 6 - see DBWR trace file
ORA-01111: name for data file 6 is unknown - rename to correct file
ORA-01110: data file 6: '/u01/app/oracle/product/11.2.0/dbhome_1/dbs/UNNAMED00006'
Managed Standby Recovery not using Real Time Apply
MRP0: Background Media Recovery process shutdown (zhuodg)
Completed: ALTER DATABASE RECOVER managed standby database using current logfile disconnect

也是有一大堆报错,都是备库不能锁定这个文件。
在主备库查询这个数据文件的状态:
主库:

 FILE# NAME                                                                                                 STATUS

     1 /u01/app/oracle/oradata/ZHUO/datafile/o1_mf_system_gxd20h14_.dbf                                     SYSTEM
     2 /u01/app/oracle/oradata/ZHUO/datafile/o1_mf_sysaux_gxd20k1y_.dbf                                     ONLINE
     3 /u01/app/oracle/oradata/ZHUO/datafile/o1_mf_undotbs1_gxd20lnp_.dbf                                   ONLINE
     4 /u01/app/oracle/oradata/ZHUO/datafile/o1_mf_users_gxd20pxk_.dbf                                      ONLINE
     5 /u01/app/oracle/oradata/ZHUO/datafile/o1_mf_zhuo_gxdcfr5s_.dbf                                       ONLINE
     6 /u01/app/oracle/oradata/ZHUO/datafile/o1_mf_zhuo_hcdsblpt_.dbf                                       ONLINE  ----<<<新添加的数据文件

6 rows selected.
备库

SQL> select file#,name,status from v$datafile;

 FILE# NAME                                                                                                 STATUS

     1 /u01/app/oracle/oradata/zhuodg/ZHUODG/datafile/o1_mf_system_05uvp9bj_.dbf                            SYSTEM
     2 /u01/app/oracle/oradata/zhuodg/ZHUODG/datafile/o1_mf_sysaux_06uvp9bj_.dbf                            ONLINE
     3 /u01/app/oracle/oradata/zhuodg/ZHUODG/datafile/o1_mf_undotbs1_07uvp9bj_.dbf                          ONLINE
     4 /u01/app/oracle/oradata/zhuodg/ZHUODG/datafile/o1_mf_users_08uvp9c4_.dbf                             ONLINE
     5 /u01/app/oracle/oradata/zhuodg/ZHUODG/datafile/o1_mf_zhuo_04uvp9bj_.dbf                              ONLINE
     6 /u01/app/oracle/product/11.2.0/dbhome_1/dbs/UNNAMED00006                                             RECOVER ----<<<同步过来的数据文件命名和状态都不对

问题分析
根据mos:Background Media Recovery terminated with ORA-1274 after adding a Datafile (Doc ID 739618.1)
1)查看参数

SQL> show parameter standby;

NAME TYPE VALUE


standby_archive_dest string ?/dbs/arch
standby_file_management string MANUAL

2)重命名数据文件

SQL> alter database create datafile '/u01/app/oracle/product/11.2.0/dbhome_1/dbs/UNNAMED00006' as new;

Database altered.

SQL> select file#,name,status from v$datafile;

 FILE# NAME                                                                                                 STATUS

     1 /u01/app/oracle/oradata/zhuodg/ZHUODG/datafile/o1_mf_system_05uvp9bj_.dbf                            SYSTEM
     2 /u01/app/oracle/oradata/zhuodg/ZHUODG/datafile/o1_mf_sysaux_06uvp9bj_.dbf                            ONLINE
     3 /u01/app/oracle/oradata/zhuodg/ZHUODG/datafile/o1_mf_undotbs1_07uvp9bj_.dbf                          ONLINE
     4 /u01/app/oracle/oradata/zhuodg/ZHUODG/datafile/o1_mf_users_08uvp9c4_.dbf                             ONLINE
     5 /u01/app/oracle/oradata/zhuodg/ZHUODG/datafile/o1_mf_zhuo_04uvp9bj_.dbf                              ONLINE
     6 /u01/app/oracle/oradata/zhuodg/ZHUODG/datafile/o1_mf_zhuo_hcdw395t_.dbf                              RECOVER

6 rows selected.

3)设置参数
SQL> alter system set standby_file_management=auto;

System altered.
4)重启mrp进程
SQL> recover managed standby database using current logfile disconnect;
Media recovery complete.

alert日志:

Sat May 09 17:00:11 2020
ALTER DATABASE RECOVER managed standby database using current logfile disconnect
Attempt to start background Managed Standby Recovery process (zhuodg)
Sat May 09 17:00:11 2020
MRP0 started with pid=19, OS id=3386
MRP0: Background Managed Standby Recovery process started (zhuodg)
Serial Media Recovery started
Managed Standby Recovery starting Real Time Apply
Waiting for all non-current ORLs to be archived...
All non-current ORLs have been archived.
Media Recovery Log /u01/app/oracle/oradata/arch/zhuodg/ZHUODG/archivelog/2020_05_09/o1_mf_1_63_hcdsmnt1_.arc
Media Recovery Log /u01/app/oracle/oradata/arch/zhuodg/ZHUODG/archivelog/2020_05_09/o1_mf_1_64_hcdsmx23_.arc
Media Recovery Log /u01/app/oracle/oradata/arch/zhuodg/ZHUODG/archivelog/2020_05_09/o1_mf_1_65_hcdsn2tw_.arc
Media Recovery Waiting for thread 1 sequence 66 (in transit)
Recovery of Online Redo Log: Thread 1 Group 4 Seq 66 Reading mem 0
Mem# 0: /u01/app/oracle/oradata/zhuodg/ZHUODG/onlinelog/o1_mf_4_hcdrt3nx_.log
Completed: ALTER DATABASE RECOVER managed standby database using current logfile disconnect

—日志传输已经正常,以前没应用的日志已经应用。

5)查看同步状态

SQL> SELECT * FROM v$dataguard_stats WHERE name LIKE '%lag%';
NAME VALUE UNIT TIME_COMPUTED DATUM_TIME


transport lag +00 00:00:00 day(2) to second(0) interval 05/09/2020 17:01:52 05/09/2020 17:01:51
apply lag +00 00:00:00 day(2) to second(0) interval 05/09/2020 17:01:52 05/09/2020 17:01:51

2 rows selected.

SQL> col client_pid for a10
SQL> SELECT inst_id, thread#, process, pid, status, client_process, client_pid,
2 sequence#, block#, active_agents, known_agents FROM gv$managed_standby ORDER BY thread#, pid;

INST_ID THREAD# PROCESS PID STATUS CLIENT_P CLIENT_PID SEQUENCE# BLOCK# ACTIVE_AGENTS KNOWN_AGENTS


   1        0 ARCH             2706 CONNECTED    ARCH     2706               0          0             0            0
   1        0 RFS              2840 IDLE         UNKNOWN  2830               0          0             0            0
   1        0 RFS              2858 IDLE         UNKNOWN  2824               0          0             0            0
   1        0 RFS              2869 IDLE         ARCH     2828               0          0             0            0
   1        1 ARCH             2702 CLOSING      ARCH     2702              64          1             0            0
   1        1 ARCH             2704 CLOSING      ARCH     2704              65          1             0            0
   1        1 ARCH             2708 CLOSING      ARCH     2708              63          1             0            0
   1        1 RFS              2860 IDLE         LGWR     2832              66      10444             0            0
   1        1 MRP0             3386 APPLYING_LOG N/A      N/A               66      10444             0            0

9 rows selected.

同步状态已经正常

问题原因
物理standby端的standby_file_management参数设置错误,导致在primary端添加数据文件或者创建表空间的时候,不能正常传输到备库。

解决方案:
alter database create datafile '/u01/app/oracle/product/11.2.0/dbhome_1/dbs/UNNAMED00006' as new;
alter system set standby_file_management=auto;
recover managed standby database using current logfile disconnect;

扩展
多数情况下,primary数据库的修改会随着redo数据传播到物理standby数据库端并被应用,不需要在物理standby端做额外的操作,不过根据实际配置的不同,也会有例外,
有些操作不是没有被传播到standby端,而是传播过去了,但不能正确执行,其中最常见的就是对表空间和日志文件的管理操作。

创建表空间或数据文件
初始化参数standby_file_management用来控制是否自动将primary数据库增加表空间或数据文件的改动,传播到物理standby数据库。该参数有两个值:
1)auto:如果该参数值设置为AUTO,则primary数据库执行的表空间创建操作也会传播到物理standby数据库上执行。
2)MANUAL:如果设置为MANUAL或者未设置任何值(默认值是MANUAL),需要手工复制新创建的数据文件到物理standby服务器。

standby_file_management参数特指primary数据库端的表空间或数据文件创建,如果数据文件是从其他数据库复制而来(比如通过TTS传输表空间),则不管standby_file_management参数值如何设置。
都必须手工复制到standby数据库,并重建物理standby数据库的控制文件。

备库standby_file_management参数为auto
SQL> show parameter standby;

NAME TYPE VALUE

standby_archive_dest string ?/dbs/arch
standby_file_management string AUTO
SQL>

主库创建表空间
SQL> create tablespace test datafile size 20M;

Tablespace created.
查看数据文件路径
SQL> col tbs for a10
SQL> col files for a100
SQL> select a.ts#,a.name tbs,b.name files from ts$ a,v$datafile b where a.ts#=b.ts#;

TS# TBS FILES
1

 0 SYSTEM     /u01/app/oracle/oradata/ZHUO/datafile/o1_mf_system_gxd20h14_.dbf
 1 SYSAUX     /u01/app/oracle/oradata/ZHUO/datafile/o1_mf_sysaux_gxd20k1y_.dbf
 2 UNDOTBS1   /u01/app/oracle/oradata/ZHUO/datafile/o1_mf_undotbs1_gxd20lnp_.dbf
 4 USERS      /u01/app/oracle/oradata/ZHUO/datafile/o1_mf_users_gxd20pxk_.dbf
 5 ZHUO       /u01/app/oracle/oradata/ZHUO/datafile/o1_mf_zhuo_gxdcfr5s_.dbf
 5 ZHUO       /u01/app/oracle/oradata/ZHUO/datafile/o1_mf_zhuo_hcdsblpt_.dbf
 6 TEST       /u01/app/oracle/oradata/ZHUO/datafile/o1_mf_test_hcf6r72x_.dbf    ---《《《新创建的

7 rows selected.
备库查看数据文件路径:
SQL> col tbs for a10
SQL> col files for a100
SQL> set lines 200

TS# TBS FILES
1

 0 SYSTEM     /u01/app/oracle/oradata/zhuodg/ZHUODG/datafile/o1_mf_system_05uvp9bj_.dbf
 1 SYSAUX     /u01/app/oracle/oradata/zhuodg/ZHUODG/datafile/o1_mf_sysaux_06uvp9bj_.dbf
 2 UNDOTBS1   /u01/app/oracle/oradata/zhuodg/ZHUODG/datafile/o1_mf_undotbs1_07uvp9bj_.dbf
 4 USERS      /u01/app/oracle/oradata/zhuodg/ZHUODG/datafile/o1_mf_users_08uvp9c4_.dbf
 5 ZHUO       /u01/app/oracle/oradata/zhuodg/ZHUODG/datafile/o1_mf_zhuo_04uvp9bj_.dbf
 5 ZHUO       /u01/app/oracle/oradata/zhuodg/ZHUODG/datafile/o1_mf_zhuo_hcdw395t_.dbf
 6 TEST       /u01/app/oracle/oradata/zhuodg/ZHUODG/datafile/o1_mf_test_hcf6r765_.dbf  ---《《显示正常

7 rows selected.

SQL>

备库standby_file_management参数为manual
备库修改参数为manual
SQL> alter system set standby_file_management=manual;

System altered.
主库添加数据文件并查看数据文件路径:

SQL> alter tablespace test add datafile size 10m;

Tablespace altered.

SQL> select a.ts#,a.name tbs,b.name files from ts$ a,v$datafile b where a.ts#=b.ts#;

TS# TBS FILES
1

 0 SYSTEM     /u01/app/oracle/oradata/ZHUO/datafile/o1_mf_system_gxd20h14_.dbf
 1 SYSAUX     /u01/app/oracle/oradata/ZHUO/datafile/o1_mf_sysaux_gxd20k1y_.dbf
 2 UNDOTBS1   /u01/app/oracle/oradata/ZHUO/datafile/o1_mf_undotbs1_gxd20lnp_.dbf
 4 USERS      /u01/app/oracle/oradata/ZHUO/datafile/o1_mf_users_gxd20pxk_.dbf
 5 ZHUO       /u01/app/oracle/oradata/ZHUO/datafile/o1_mf_zhuo_gxdcfr5s_.dbf
 5 ZHUO       /u01/app/oracle/oradata/ZHUO/datafile/o1_mf_zhuo_hcdsblpt_.dbf
 6 TEST       /u01/app/oracle/oradata/ZHUO/datafile/o1_mf_test_hcf6r72x_.dbf
 6 TEST       /u01/app/oracle/oradata/ZHUO/datafile/o1_mf_test_hcf76186_.dbf

8 rows selected.

备库查看:
SQL> select a.ts#,a.name tbs,b.name files from ts$ a,v$datafile b where a.ts#=b.ts#;

TS# TBS FILES
1

 0 SYSTEM     /u01/app/oracle/oradata/zhuodg/ZHUODG/datafile/o1_mf_system_05uvp9bj_.dbf
 1 SYSAUX     /u01/app/oracle/oradata/zhuodg/ZHUODG/datafile/o1_mf_sysaux_06uvp9bj_.dbf
 2 UNDOTBS1   /u01/app/oracle/oradata/zhuodg/ZHUODG/datafile/o1_mf_undotbs1_07uvp9bj_.dbf
 4 USERS      /u01/app/oracle/oradata/zhuodg/ZHUODG/datafile/o1_mf_users_08uvp9c4_.dbf
 5 ZHUO       /u01/app/oracle/oradata/zhuodg/ZHUODG/datafile/o1_mf_zhuo_04uvp9bj_.dbf
 5 ZHUO       /u01/app/oracle/oradata/zhuodg/ZHUODG/datafile/o1_mf_zhuo_hcdw395t_.dbf
 6 TEST       /u01/app/oracle/oradata/zhuodg/ZHUODG/datafile/o1_mf_test_hcf6r765_.dbf
 6 TEST       /u01/app/oracle/product/11.2.0/dbhome_1/dbs/UNNAMED00008         ----《《《文件命名异常,有unname字样,路径在$ORACLE_HOME/dbs存放

8 rows selected.
alert日志:

Sat May 09 19:59:03 2020
Successfully added datafile 7 to media recovery
Datafile #7: ‘/u01/app/oracle/oradata/zhuodg/ZHUODG/datafile/o1_mf_test_hcf6r765_.dbf’
Sat May 09 20:03:32 2020
WARNING: Heavy swapping observed on system in last 5 mins.
pct of memory swapped in [56.13%] pct of memory swapped out [37.44%].
Please make sure there is no memory pressure and the SGA and PGA
are configured correctly. Look at DBRM trace file for more details.
Sat May 09 20:06:09 2020
ALTER SYSTEM SET standby_file_management=‘MANUAL’ SCOPE=BOTH;
Sat May 09 20:06:25 2020
File #8 added to control file as ‘UNNAMED00008’ because
the parameter STANDBY_FILE_MANAGEMENT is set to MANUAL
The file should be manually created to continue.
MRP0: Background Media Recovery terminated with error 1274
Errors in file /u01/app/oracle/diag/rdbms/zhuodg/zhuodg/trace/zhuodg_mrp0_3386.trc:
ORA-01274: cannot add datafile ‘/u01/app/oracle/oradata/ZHUO/datafile/o1_mf_test_hcf76186_.dbf’ - file could not be created
Managed Standby Recovery not using Real Time Apply
Recovery interrupted!
Recovered data files to a consistent state at change 808919
MRP0: Background Media Recovery process shutdown (zhuodg) ----《《《MRP进程异常停止

发现和上述案例的现象一致,就是因为参数为manual的原因。
处理方式就是上述方法,不在叙述。

备库standby_file_management参数为auto
主库删除表空间
SQL> drop tablespace test including contents and datafiles;

Tablespace dropped.

SQL> select a.ts#,a.name tbs,b.name files from ts$ a,v$datafile b where a.ts#=b.ts#;

TS# TBS FILES
1

 0 SYSTEM     /u01/app/oracle/oradata/ZHUO/datafile/o1_mf_system_gxd20h14_.dbf
 1 SYSAUX     /u01/app/oracle/oradata/ZHUO/datafile/o1_mf_sysaux_gxd20k1y_.dbf
 2 UNDOTBS1   /u01/app/oracle/oradata/ZHUO/datafile/o1_mf_undotbs1_gxd20lnp_.dbf
 4 USERS      /u01/app/oracle/oradata/ZHUO/datafile/o1_mf_users_gxd20pxk_.dbf
 5 ZHUO       /u01/app/oracle/oradata/ZHUO/datafile/o1_mf_zhuo_gxdcfr5s_.dbf
 5 ZHUO       /u01/app/oracle/oradata/ZHUO/datafile/o1_mf_zhuo_hcdsblpt_.dbf

6 rows selected.
备库查看
SQL> select a.ts#,a.name tbs,b.name files from ts$ a,v$datafile b where a.ts#=b.ts#;

TS# TBS FILES
1

 0 SYSTEM     /u01/app/oracle/oradata/zhuodg/ZHUODG/datafile/o1_mf_system_05uvp9bj_.dbf
 1 SYSAUX     /u01/app/oracle/oradata/zhuodg/ZHUODG/datafile/o1_mf_sysaux_06uvp9bj_.dbf
 2 UNDOTBS1   /u01/app/oracle/oradata/zhuodg/ZHUODG/datafile/o1_mf_undotbs1_07uvp9bj_.dbf
 4 USERS      /u01/app/oracle/oradata/zhuodg/ZHUODG/datafile/o1_mf_users_08uvp9c4_.dbf
 5 ZHUO       /u01/app/oracle/oradata/zhuodg/ZHUODG/datafile/o1_mf_zhuo_04uvp9bj_.dbf
 5 ZHUO       /u01/app/oracle/oradata/zhuodg/ZHUODG/datafile/o1_mf_zhuo_hcdw395t_.dbf

6 rows selected.

正常删除,同步到了备库

结论
综上,初始化参数standby_file_management为manual的时候,对于表空间和数据文件的增加和删除操作必须手工处理。
为auto的时候,对于表空间和数据文件的操作无需dba手工干预,物理standby能自动进行处理。

使用MySQL 8.0 克隆(clone)插件快速添加MGR节点
目录
一、MySQL 8.0.17的克隆clone简介
二、MGR现有环境
三、使用clone技术添加MGR节点
3.1 初始化新节点
3.2 新节点安装clone插件和组复制插件
3.3 新节点执行克隆任务
3.4 在原3节点执行修改参数
3.5 新节点启动MGR
四、总结
复制
一、MySQL 8.0.17的克隆clone简介
MySQL 8.0.17的克隆插件允许在本地或从远程 MySQL 实例在线克隆数据,从此搭建从库可以不再需要备份工具(PXB或mysqldump)来实现了。克隆数据是存储在 InnoDB 其中的数据的物理快照,其中包括库、表、表空间和数据字典元数据。克隆的数据包含一个功能齐全的数据目录,允许使用克隆插件进行 MySQL 服务器配置。

克隆插件支持两种克隆方式:

本地克隆:本地克隆操作将启动克隆操作的 MySQL 服务器实例中的数据克隆到同服务器或同节点上的一个目录里。

远程克隆:默认情况下,远程克隆操作会删除接受者(recipient)数据目录中的数据,并将其替换为捐赠者(donor)的克隆数据。(可选)您也可以将数据克隆到接受者的其他目录,以避免删除现有数据。

远程克隆操作和本地克隆操作克隆的数据没有区别,数据是相同的。克隆插件支持复制。除克隆数据外,克隆操作还从捐赠者中提取并传输复制位置信息,并将其应用于接受者,从而可以使用克隆插件来配置组复制或主从复制。使用克隆插件进行配置比复制大量事务要快得多,效率更高。
MySQL 8.0 clone插件提供从一个实例克隆数据的功能,克隆功能提供了更有效的方式来快速创建MySQL实例,搭建主从复制和组复制。本文介绍使用 MySQL 8.0 clone 插件快速添加组复制(MGR)节点的方法。
官网地址:https://dev.mysql.com/doc/refman/8.0/en/clone-plugin.html

二、MGR现有环境
搭建MGR环境请参考:【DB宝18】在Docker中安装使用MySQL高可用之MGR

现已有MGR集群,多主模式:

MySQL [(none)]> SELECT * FROM performance_schema.replication_group_members;
CHANNEL_NAMEMEMBER_IDMEMBER_HOSTMEMBER_PORTMEMBER_STATEMEMBER_ROLEMEMBER_VERSION
group_replication_applier611717fe-d785-11ea-9342-0242ac48000f172.72.0.153306ONLINEPRIMARY8.0.20
group_replication_applier67090f47-d785-11ea-b76c-0242ac480010172.72.0.163306ONLINEPRIMARY8.0.20
group_replication_applier678cf064-d785-11ea-b8ce-0242ac480011172.72.0.173306ONLINEPRIMARY8.0.20

3 rows in set (0.00 sec)
复制
docker环境:

[root@docker35 ~]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
9dd3d6b221b0 mysql:8.0.20 "docker-entrypoint.s…" 3 months ago Up 10 minutes 33060/tcp, 0.0.0.0:33067->3306/tcp mysql8020mgr33067
189b5a889665 mysql:8.0.20 "docker-entrypoint.s…" 3 months ago Up 10 minutes 33060/tcp, 0.0.0.0:33066->3306/tcp mysql8020mgr33066
6ce08dd5dc84 mysql:8.0.20 "docker-entrypoint.s…" 3 months ago Up 10 minutes 33060/tcp, 0.0.0.0:33065->3306/tcp mysql8020mgr33065
复制
通过clone插件新加节点:172.72.0.18。

三、使用clone技术添加MGR节点
3.1 初始化新节点
mkdir -p /usr/local/mysql/lhrmgr18/conf.d
mkdir -p /usr/local/mysql/lhrmgr18/data

docker run -d --name mysql8020mgr33068 \
-h lhrmgr18 -p 33068:3306 --net=mysql-network --ip 172.72.0.18 \
-v /usr/local/mysql/lhrmgr18/conf.d:/etc/mysql/conf.d -v /usr/local/mysql/lhrmgr18/data:/var/lib/mysql/ \
-e MYSQL_ROOT_PASSWORD=lhr \
-e TZ=Asia/Shanghai \
mysql:8.0.20

cat > /usr/local/mysql/lhrmgr18/conf.d/my.cnf <<"EOF"
[mysqld]
user=mysql
port=3306
character_set_server=utf8mb4
secure_file_priv=''
server-id = 802033068
log-bin =
binlog_format=row
binlog_checksum=NONE
log-slave-updates=1
skip-name-resolve
auto-increment-increment=2
auto-increment-offset=1
gtid-mode=ON
enforce-gtid-consistency=on
default_authentication_plugin=mysql_native_password
max_allowed_packet = 500M
log_slave_updates=on

master_info_repository=TABLE
relay_log_info_repository=TABLE
relay_log=lhrmgr18-relay-bin-ip18

transaction_write_set_extraction=XXHASH64
loose-group_replication_group_name="aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa"
loose-group_replication_start_on_boot=OFF
loose-group_replication_local_address= "172.72.0.18:33064"
loose-group_replication_group_seeds= "172.72.0.15:33061,172.72.0.16:33062,172.72.0.17:33063,172.72.0.18:33064"
loose-group_replication_bootstrap_group=OFF
loose-group_replication_ip_whitelist="172.72.0.15,172.72.0.16,172.72.0.17,172.72.0.18"
report_host=172.72.0.18
report_port=3306

EOF

docker restart mysql8020mgr33068

docker ps
mysql -uroot -plhr -h192.168.1.35 -P33065 -e "select @@hostname,@@server_id,@@server_uuid"
mysql -uroot -plhr -h192.168.1.35 -P33066 -e "select @@hostname,@@server_id,@@server_uuid"
mysql -uroot -plhr -h192.168.1.35 -P33067 -e "select @@hostname,@@server_id,@@server_uuid"
mysql -uroot -plhr -h192.168.1.35 -P33068 -e "select @@hostname,@@server_id,@@server_uuid"
mysql -uroot -plhr -h192.168.1.35 -P33065
mysql -uroot -plhr -h192.168.1.35 -P33066
mysql -uroot -plhr -h192.168.1.35 -P33067
mysql -uroot -plhr -h192.168.1.35 -P33068
docker logs -f --tail 10 mysql8020mgr33065
docker logs -f --tail 10 mysql8020mgr33066
docker logs -f --tail 10 mysql8020mgr33067
docker logs -f --tail 10 mysql8020mgr33068

复制
3.2 新节点安装clone插件和组复制插件
mysql -uroot -plhr -h192.168.1.35 -P33068

-- 安装MGR插件
INSTALL PLUGIN group_replication SONAME 'group_replication.so';

-- 安装clone插件(注意,需要在源MGR3个节点都安装)
INSTALL PLUGIN clone SONAME 'mysql_clone.so';
复制
3.3 新节点执行克隆任务
-- 设置克隆源,将clone_valid_donor_list设置为MGR节点
SET GLOBAL clone_valid_donor_list = '172.72.0.15:3306';

-- 新节点开始克隆
CLONE INSTANCE FROM 'root'@'172.72.0.15':3306 IDENTIFIED BY 'lhr';

-- 由于是docker环境,所以需要重启容器
-- ERROR 3707 (HY000): Restart server failed (mysqld is not managed by supervisor process).
docker restart mysql8020mgr33068

-- 查看克隆进度和状态
MySQL [(none)]> SELECT * FROM performance_schema.clone_status \G
1. row **

         ID: 1
        PID: 0
      STATE: Completed
 BEGIN_TIME: 2020-11-13 15:03:07.076
   END_TIME: 2020-11-13 15:04:31.224
     SOURCE: 172.72.0.15:3306
DESTINATION: LOCAL INSTANCE
   ERROR_NO: 0

ERROR_MESSAGE:

BINLOG_FILE: lhrmgr15-bin.000005

BINLOG_POSITION: 1235
GTID_EXECUTED: aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-25
1 row in set (0.01 sec)

MySQL [(none)]> select

->   stage,
->   state,
->   cast(begin_time as DATETIME) as "START TIME",
->   cast(end_time as DATETIME) as "FINISH TIME",
->   lpad(sys.format_time(power(10,12) * (unix_timestamp(end_time) - unix_timestamp(begin_time))), 10, ' ') as DURATION,
->   lpad(concat(format(round(estimate/1024/1024,0), 0), "MB"), 16, ' ') as "Estimate",
->   case when begin_time is NULL then LPAD('%0', 7, ' ')
->   when estimate > 0 then
->   lpad(concat(round(data*100/estimate, 0), "%"), 7, ' ')
->   when end_time is NULL then lpad('0%', 7, ' ')
->   else lpad('100%', 7, ' ')
->   end as "Done(%)"
->   from performance_schema.clone_progress;
stagestateSTART TIMEFINISH TIMEDURATIONEstimateDone(%)
DROP DATACompleted2020-11-13 15:03:072020-11-13 15:03:08320.98 ms0MB100%
FILE COPYCompleted2020-11-13 15:03:082020-11-13 15:03:102.12 s64MB100%
PAGE COPYCompleted2020-11-13 15:03:102020-11-13 15:03:10160.27 ms0MB100%
REDO COPYCompleted2020-11-13 15:03:102020-11-13 15:03:10100.76 ms0MB100%
FILE SYNCCompleted2020-11-13 15:03:102020-11-13 15:03:132.83 s0MB100%
RESTARTCompleted2020-11-13 15:03:132020-11-13 15:04:301.29 m0MB100%
RECOVERYCompleted2020-11-13 15:04:302020-11-13 15:04:311.19 s0MB100%

7 rows in set (0.01 sec)
复制
3.4 在原3节点执行修改参数
set global group_replication_group_seeds='172.72.0.15:33061,172.72.0.16:33062,172.72.0.17:33063,172.72.0.18:33064';
stop group_replication;
set global group_replication_ip_whitelist="172.72.0.15,172.72.0.16,172.72.0.17,172.72.0.18";
start group_replication;
复制
3.5 新节点启动MGR
-- 由于已有MGR集群是多主模式,需要先设置成多主模式
set global group_replication_single_primary_mode=OFF;
set global group_replication_enforce_update_everywhere_checks=ON;

-- 加入组复制
START GROUP_REPLICATION;

-- 查看组复制成员及状态
SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+

CHANNEL_NAMEMEMBER_IDMEMBER_HOSTMEMBER_PORTMEMBER_STATEMEMBER_ROLEMEMBER_VERSION
group_replication_applier276804ba-257c-11eb-b8ea-0242ac480012172.72.0.183306ONLINEPRIMARY8.0.20
group_replication_applier611717fe-d785-11ea-9342-0242ac48000f172.72.0.153306ONLINEPRIMARY8.0.20
group_replication_applier67090f47-d785-11ea-b76c-0242ac480010172.72.0.163306ONLINEPRIMARY8.0.20
group_replication_applier678cf064-d785-11ea-b8ce-0242ac480011172.72.0.173306ONLINEPRIMARY8.0.20

4 rows in set (0.00 sec)

-- 新节点查询数据库

MySQL [(none)]> show databases;
Database
information_schema
lhrdb
mysql
performance_schema
sys

5 rows in set (0.01 sec
复制
至此,通过clone插件的方式添加MGR节点已成功,非常简单也非常快速。

四、总结
克隆技术的一些限制条件:

版本大于等于8.0.17且不支持跨版本。要求相同版本号,您无法MySQL 5.7和MySQL 8.0之间进行克隆,在8.0.19和8.0.20之间也不可以,而且要求版本>=8.0.17。

克隆操作期间不允许使用 DDL,允许并发DML。

两台机器具有相同的操作系统OS。同一平台同一架构,例如linux to windows、x64 to x32 是不支持。

两台MySQL实例具体相同的 innodb_page_size 和 innodb_data_file_path(ibdata文件名)

同一时刻仅仅允许有一个克隆任务存在

recipient 需要设置变量clone_valid_donor_list

max_allowed_packet 大于2M

doner的undo表空间文件名称不能重复

不会克隆my.cnf文件

不会克隆binlog二进制日志。

仅仅支持innodb引擎。不克隆其他存储引擎数据。MyISAM并且 CSV存储在包括sys模式的任何模式中的表都被克隆为空表。

捐赠者和接受者都需要安装克隆插件

捐赠者和接受者分别需要有至少BACKUP_ADMIN/CLONE_ADMIN权限的账号

不支持通过MySQL router连接到捐赠者实例。

默认情况下,克隆数据后会自动重新启动接受者 MySQL 实例。要自动重新启动,必须在接收方上提供监视进程以检测服务器是否已关闭。否则,在克隆数据后,克隆操作将停止并出现以下错误,并且关闭接受者 MySQL 服务器实例。此错误不表示克隆失败。这意味着必须在克隆数据后手动重新启动接受者的 MySQL 实例。

本文结束。