ADG的备库mrp进程状态一直显示WAIT_FOR_LOG
问题描述
在搭建完单机主备的adg后,MRP0进程的状态一直显示WAIT_FOR_LOG,这个和预期的APPLYING_LOG不符。另外经过测试发现,主备插入数据,备库只有当手动切换归档日志,备库才可以同步到主库写入的数据。
select process,status ,thread#,sequence# from v$managed_standby;
PROCESS STATUS THREAD# SEQUENCE#
--------- ------------ ---------- ----------
ARCH CONNECTED 0 0
DGRD ALLOCATED 0 0
DGRD ALLOCATED 0 0
ARCH CONNECTED 0 0
ARCH CONNECTED 0 0
ARCH CONNECTED 0 0
RFS IDLE 1 0
RFS IDLE 1 5637
MRP0 WAIT_FOR_LOG 1 5637
RFS IDLE 0 0
RFS IDLE 0 011 rows selected.
另外主备之间使用LGWR async异步传输方式
alter database recover managed standby database using current logfile disconnect from session;
问题排查思路
也通过查找网上的资料,排查了网络、密码文件、pfile、tnsnames.ora文件等等。最后还是通过和同事沟通,通过对adg的原理分析,实时同步应该是把主库的redo log日志实时同步到备库的standby redo logs,目前只是没有应用最新的redolog,猜测是否与redo log配置或者其他因素影响呢。
1、先核实standby redo logs的状态信息,和同事沟通,反馈说正常情况有一个redo的状态应该为active
SQL> select group#,thread#,sequence#,archived,status from v$standby_log;GROUP# THREAD# SEQUENCE# ARC STATUS
---------- ---------- ---------- --- ----------11 1 0 YES UNASSIGNED12 1 0 YES UNASSIGNED13 1 0 YES UNASSIGNED
2、继续排查主库redo文件和备库redo文件大小
查看主库的redo文件
[root@jcyjs4 ORCL]# ls -lh redo*
-rw-r----- 1 oracle dba 1001M Jan 11 14:34 redo04.log
-rw-r----- 1 oracle dba 1001M Jan 11 14:19 redo05.log
-rw-r----- 1 oracle dba 1001M Jan 11 14:32 redo06.log
备库的redo文件大小
[oracle@jcyjs3 standby_redo_logs]$ ls -lh
-rw-r----- 1 oracle dba 201M Jan 11 14:08 redo11.log
-rw-r----- 1 oracle dba 201M Jan 11 14:08 redo12.log
-rw-r----- 1 oracle dba 201M Jan 11 14:08 redo13.log
问题原因
通过上述的核实,猜测可能是主备库的文件大小不一致导致,经过查找资料也验证了这点。
https://docs.oracle.com/en/database/oracle/oracle-database/21/sbydb/oracle-data-guard-redo-transport-services.html#GUID-E6EC6104-3C38-482D-B807-A0E84ECFB937
解决方案
1、首先取消备库的日志应用
SQL> alter database recover managed standby database cancel;
Database altered.
2、删除standby redo logs文件,并在操作系统中手动删除文件
alter database drop standby logfile group 11 ;
alter database drop standby logfile group 12 ;
alter database drop standby logfile group 13 ;
另外当备库的redo logs文件的状态为ACTIVE时,删除redolog日志,报错ORA-00261、ORA-00312,需要执行如下命令
--备库
alter database recover managed standby database cancel;
alter system set standby_file_management='manual';
--主库多次执行alter system switch logfile;
3、重建standby redo logs,大小和主库一致或者大于主库,另外建议备库的redo文件需要比主库多1个,这是因为极端情况情况可能redo不够用
alter database add standby logfile thread 1 group 11('/data3/oracle/oradata/ORCLDG/standby_redo_logs/redo11.log') size 1001M;
alter database add standby logfile thread 1 group 12('/data3/oracle/oradata/ORCLDG/standby_redo_logs/redo12.log') size 1001M;
alter database add standby logfile thread 1 group 13('/data3/oracle/oradata/ORCLDG/standby_redo_logs/redo13.log') size 1001M;
alter database add standby logfile thread 1 group 14('/data3/oracle/oradata/ORCLDG/standby_redo_logs/redo14.log') size 1001M;
查看standby redo logs的状态信息
SQL> select group#,thread#,sequence#,archived,status from v$standby_log;GROUP# THREAD# SEQUENCE# ARC STATUS
---------- ---------- ---------- --- ----------11 1 0 YES UNASSIGNED12 1 0 YES UNASSIGNED13 1 0 YES UNASSIGNED14 1 0 YES UNASSIGNED
4、启动备库的日志应用
SQL> alter database recover managed standby database using current logfile disconnect from session;Database altered.
5、在主库手动切归档
alter system switch logfile;
6、查看standby redo logs的状态信息
SQL> select group#,thread#,sequence#,archived,status from v$standby_log;GROUP# THREAD# SEQUENCE# ARC STATUS
---------- ---------- ---------- --- ----------11 1 5758 YES ACTIVE12 1 0 YES UNASSIGNED13 1 0 YES UNASSIGNED14 1 0 YES UNASSIGNED
7、查看MRP0进程的状态信息,从WAIT_FOR_LOG转为了APPLYING_LOG
SQL> /PROCESS STATUS THREAD# SEQUENCE#
--------- ------------ ---------- ----------
ARCH CLOSING 1 5751
DGRD ALLOCATED 0 0
DGRD ALLOCATED 0 0
ARCH CLOSING 1 5753
ARCH CLOSING 1 5755
ARCH CLOSING 1 5749
RFS IDLE 1 0
RFS IDLE 1 5758
MRP0 APPLYING_LOG 1 5758
RFS IDLE 0 0
RFS IDLE 0 011 rows selected.
经过上述排查处理后adg的MRP进程状态显示为APPLYING_LOG,并且能够实时的同步主库写入的数据