dg故障处理

Posted by wukaiqiang; tagged with none

1、上午用户反馈dg数据和生产数据不一致
检查发现mpr进行已停止
select * from v$managed_standby;
2、检查alert.log
发现告警:ORA 00494 ENQUEUE[CF] held for too long (more than 900 seconds) by 'inst 1,osid 469226'
incident details in:/u01/app/....../sid_pr00_258815_i240250.trc

打开trace文件,发现loadAVG: 16.9,15.22,10.59
cat /proc/cpuinfo | grpe proc |wc -l 4 4个cpu,明显压力过大
经分析发现mrp0 进程停止的时间为备份软件启动时间。且主库的cpu、内存配置是备库的2倍,由此可以断点,是因为cpu过大,导致产生了长时间CF控制文件锁,导致恢复进程停止。
3、手动启动应用进程
alter database recover managed standby database using current logfile disconnect from session
提示相应的未找到归档日志
检查v$archived_log 发现name字段中存在备份一体机的备份的归档信息。
4、手动清理rman中,备份一体机的备份信息
delete archivelog like ‘%xxxxx.dbf%’
5、重新启动应用进程
alter database recover managed standby database using current logfile disconnect from session
6、检查发现
select * from v$managed_standby;
mrp0进行已经正常启动
7、检查应用执行情况

select thread#,sequance#,'last applied:' logs,to_char(next_time,'DD-MON-YYYY:HH24:MI:SS') TIME
from archived_log 
 where sequence#=(select max(sequence#) from v$archived_log where applied='YES') 
UNION 
SELECT THREAD#,sequence#,'las received:' logs,to_char(next_time,'DD-MON-YYYY:HH24:MI:SS') TIME 
FROM 
V$archived_log 
  where sequence#=(select max(sequence#) from v$archived_log);