PG14归档失败解决办法archiver failed on wal_lsn

问题描述

昨晚Repmgr+PG14主备主库因wal日志撑爆磁盘,删除主库过期wal文件重做备库后上午进行主备状态巡查,主库向备库发送wal文件正常,但是查主库状态时发现显示有1条归档失败的记录。
postgres: archiver failed on 000000010000006F00000086

  • 主库:

walsender repmgr 172.28.32.23(36122) streaming 72/1BAC3A10" walsender正常
archiver failed on 000000010000006F00000086" 归档失败

  • 备库:

walreceiver streaming 77/9EB6A198" “” “” " walreceiver正常

--查主库数据库状态
[root@pgmaster ~]# systemctl status postgres
● postgres.service - PostgreSQL database server
Loaded: loaded (/usr/lib/systemd/system/postgres.service; enabled; vendor preset: disabled)
Active: active (running) since Thu 2023-10-12 22:04:08 CST; 13h ago
Process: 3710968 ExecStart=/server/data/pgdb/pgsql/bin/pg_ctl start -D $PGDATA (code=exited, status=0/SUCCESS)
Main PID: 3710970 (postgres)
Tasks: 53 (limit: 201967)
Memory: 19.0G
CGroup: /system.slice/postgres.service
├─ 3710970 /server/data/pgdb/pgsql/bin/postgres -D /server/data/pgdb/data
├─ 3710971 "postgres: logger " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
├─ 3710992 "postgres: checkpointer " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
├─ 3710993 "postgres: background writer " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
├─ 3710994 "postgres: walwriter " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
├─ 3710995 "postgres: archiver failed on 000000010000006F00000086" "" "" "" "" "" "" "" "" ""
├─ 3710996 "postgres: logical replication launcher " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
├─ 3711001 "postgres: top_portal top_portal 172.28.32.18(41438) idle" "" "" "" "" "" ""
├─ 3711003 "postgres: tj_sjjh dataexchange 172.28.32.28(35406) idle" "" "" "" "" "" "" ""
├─ 3711009 "postgres: repmgr repmgr 172.28.32.22(64096) idle" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
├─ 3711468 "postgres: top_portal top_portal 172.28.32.18(41720) idle" "" "" "" "" "" ""
├─ 3713807 "postgres: top_portal top_portal 172.28.32.20(44492) idle" "" "" "" "" "" ""
├─ 3723017 "postgres: walsender repmgr 172.28.32.23(36122) streaming 72/1BAC3A10"  #wal 发送正常--查备库状态
[root@pgslave ~]# systemctl status postgres
● postgres.service - PostgreSQL database server
Loaded: loaded (/usr/lib/systemd/system/postgres.service; enabled; vendor preset: disabled)
Active: active (running) since Fri 2023-10-13 00:12:19 CST; 12h ago
Process: 1931221 ExecStart=/server/data/pgdb/pgsql/bin/pg_ctl start -D $PGDATA (code=exited, status=0/SUCCESS)
Main PID: 1931223 (postgres)
Tasks: 7 (limit: 201967)
Memory: 23.2G
CGroup: /system.slice/postgres.service
├─ 1931223 /server/data/pgdb/pgsql/bin/postgres -D /server/data/pgdb/data
├─ 1931224 "postgres: logger " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
├─ 1931225 "postgres: startup recovering 00000001000000770000009E" "" "" "" "" "" "" "" "" ""
├─ 1931226 "postgres: checkpointer " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
├─ 1931227 "postgres: background writer " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
├─ 1931230 "postgres: walreceiver streaming 77/9EB6A198" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""   #wal接收
└─ 1931430 "postgres: repmgr repmgr 172.28.32.23(22956) idle" "" "" "" "" "" "" "" "" "" "" "" "" "" ""Oct 13 00:12:17 pgslave systemd[1]: Starting PostgreSQL database server...
Oct 13 00:12:17 pgslave pg_ctl[1931221]: waiting for server to start....
Oct 13 00:12:17 pgslave pg_ctl[1931223]: 2023-10-13 00:12:17.497 CST [1931223] LOG:  redirecting log output to logging collector process
Oct 13 00:12:17 pgslave pg_ctl[1931223]: 2023-10-13 00:12:17.497 CST [1931223] HINT:  Future log output will appear in directory "log".
Oct 13 00:12:19 pgslave pg_ctl[1931221]: . done
Oct 13 00:12:19 pgslave pg_ctl[1931221]: server started
Oct 13 00:12:19 pgslave systemd[1]: Started PostgreSQL database server.

问题分析

1.查看数据库日志

在这里插入图片描述

2.查看归档配置参数

参数配置正确,归档目录权限也正确

postgres=# show archive_command;archive_command                      
-----------------------------------------------------------/usr/bin/lz4 -q -z %p /server/data/pgdb/pg_archive/%f.lz4
(1 row)postgres=# show archive_mode;archive_mode 
--------------on
(1 row)--查看归档目录的权限
[postgres@pgmaster ~]$ ls -ld /server/data/pgdb/pg_archive
drwxr-x--- 2 postgres postgres 4214784 Oct 13 13:14 /server/data/pgdb/pg_archive

3.手动切日志

手工归档成功,但是未解决,查看状态依然时卡住归档失败的那条wal记录那里

--手工归档
top_portal=# select pg_switch_wal();pg_switch_wal 
---------------72/51C4CFD8
(1 row)--查主库数据库状态
[root@pgmaster ~]# systemctl status postgres
● postgres.service - PostgreSQL database server
Loaded: loaded (/usr/lib/systemd/system/postgres.service; enabled; vendor preset: disabled)
Active: active (running) since Thu 2023-10-12 22:04:08 CST; 13h ago
Process: 3710968 ExecStart=/server/data/pgdb/pgsql/bin/pg_ctl start -D $PGDATA (code=exited, status=0/SUCCESS)
Main PID: 3710970 (postgres)
Tasks: 53 (limit: 201967)
Memory: 19.0G
CGroup: /system.slice/postgres.service
├─ 3710970 /server/data/pgdb/pgsql/bin/postgres -D /server/data/pgdb/data
├─ 3710971 "postgres: logger " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
├─ 3710992 "postgres: checkpointer " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
├─ 3710993 "postgres: background writer " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
├─ 3710994 "postgres: walwriter " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
├─ 3710995 "postgres: archiver failed on 000000010000006F00000086" "" "" "" "" "" "" "" "" ""
├─ 3710996 "postgres: logical replication launcher " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
├─ 3711001 "postgres: top_portal top_portal 172.28.32.18(41438) idle" "" "" "" "" "" ""
├─ 3711003 "postgres: tj_sjjh dataexchange 172.28.32.28(35406) idle" "" "" "" "" "" "" ""
├─ 3711009 "postgres: repmgr repmgr 172.28.32.22(64096) idle" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
├─ 3711468 "postgres: top_portal top_portal 172.28.32.18(41720) idle" "" "" "" "" "" ""
├─ 3713807 "postgres: top_portal top_portal 172.28.32.20(44492) idle" "" "" "" "" "" ""
├─ 3723017 "postgres: walsender repmgr 172.28.32.23(36122) streaming 72/1BAC3A10"  #wal 发送正常--查当前wal_lsn
top_portal=# select pg_current_wal_lsn();pg_current_wal_lsn 
--------------------72/52638F10
(1 row)--查当前wal_lsn对应的wal文件
top_portal=# select pg_walfile_name(pg_current_wal_lsn());pg_walfile_name      
--------------------------000000010000007200000052
(1 row)--查当前最新检查点,最新检查点之前的wal文件均可以删除
[postgres@pgmaster ~]$ pg_controldata $PGDATA
pg_control version number:            1300
Catalog version number:               202107181
Database system identifier:           7268852449124462799
Database cluster state:               in production
pg_control last modified:             Fri 13 Oct 2023 10:07:35 AM CST  
Latest checkpoint location:           71/CDD2FF28
Latest checkpoint's REDO location:    71/CDD28F18
Latest checkpoint's REDO WAL file:    0000000100000071000000CD--查报错中的wal文件
[postgres@pgmaster pg_wal]$ ls -l 000000010000006F00000086
-rw------- 1 postgres postgres 16777216 Oct 12 21:12 000000010000006F00000086
[postgres@pgmaster pg_wal]$ find /server/data/pgdb/pg_archive -name 000000010000006F00000086*
ls: cannot access '000000010000006F00000086': No such file or directory
[postgres@pgmaster pg_wal]$ find /server -name 000000010000006F00000086*
-rw------- 1 postgres postgres 16777216 Oct 12 21:12 000000010000006F00000086

4.检查$PGDATA/pg_wal/archive_status/目录下文件

[postgres@pgmaster ~]$ cd /server/data/pgdb/data/pg_wal/archive_status/
[postgres@pgmaster archive_status]$ ls -l *.ready
ls: cannot access '*.ready': No such file or directory说明不存在需要归档但没归档的文件

该目录下,ready说明是需要归档但是没归档的,done是归档完成了的

解决办法

1.将归档失败的wal文件备份到/home/postgres目录下(生产环境如果磁盘空间允许切记不要rm删除,mv备份到目标位置)
2.手工归档select pg_switch_wal();
3.再次查看主备库状态

--1.将归档失败的wal文件备份到/home/postgres目录下
[postgres@pgmaster pg_wal]$ mv 000000010000006F00000086 /home/postgres/000000010000006F00000086
[postgres@pgmaster pg_wal]$ ls -l /home/postgres/000000010000006F00000086
-rw------- 1 postgres postgres 16777216 Oct 12 21:12 /home/postgres/000000010000006F00000086--2.手工归档
postgres=# select pg_switch_wal();pg_switch_wal 
---------------73/7EF502E0
(1 row)--3.再次查看主库状态显示正常
[root@pgmaster data]# systemctl status postgres
● postgres.service - PostgreSQL database serverLoaded: loaded (/usr/lib/systemd/system/postgres.service; enabled; vendor preset: disabled)Active: active (running) since Thu 2023-10-12 22:04:08 CST; 13h agoProcess: 3710968 ExecStart=/server/data/pgdb/pgsql/bin/pg_ctl start -D $PGDATA (code=exited, status=0/SUCCESS)Main PID: 3710970 (postgres)Tasks: 50 (limit: 201967)Memory: 26.6GCGroup: /system.slice/postgres.service├─ 3710970 /server/data/pgdb/pgsql/bin/postgres -D /server/data/pgdb/data├─ 3710971 "postgres: logger " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""├─ 3710992 "postgres: checkpointer " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""├─ 3710993 "postgres: background writer " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""├─ 3710994 "postgres: walwriter " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""├─ 3710995 "postgres: archiver archiving 000000010000007100000035" "" "" "" "" "" "" "" "" ""├─ 3710996 "postgres: logical replication launcher " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""├─ 3711001 "postgres: top_portal top_portal 172.28.32.18(41438) idle" "" "" "" "" "" ""├─ 3711003 "postgres: tj_sjjh dataexchange 172.28.32.28(35406) idle" "" "" "" "" "" "" ""├─ 3711009 "postgres: repmgr repmgr 172.28.32.22(64096) idle" "" "" "" "" "" "" "" "" "" "" "" "" "" ""├─ 3711468 "postgres: top_portal top_portal 172.28.32.18(41720) idle" "" "" "" "" "" ""├─ 3713807 "postgres: top_portal top_portal 172.28.32.20(44492) idle" "" "" "" "" "" ""├─ 3723017 "postgres: walsender repmgr 172.28.32.23(36122) streaming 73/7F000BD0"

补充:若$PGDATA/pg_wal/archive_status/目录下存在大量的*.ready文件

可能的原因分析:如果数据库是突然断电,那么可能arvchive命令没有完全完成,归档目录会存在不完整的文件名称,重启数据库后,会出现归档失败的情况,这个时候,需要去归档目录删除相关归档失败文件,那么归档就会重新归档。
还未遇到该场景的问题,暂未实验。
参考链接

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.rhkb.cn/news/156919.html

如若内容造成侵权/违法违规/事实不符,请联系长河编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Tomcat的安装和配置

一.Tomcat下载:去Tomcat官网地址 在左侧Download中选择你需要下载的版本,这里我选择Tomcat9 根据电脑系统是32位还是64位选择,这里我选择64-bit Windows zip,点击即可下载 下载后直接解压,这里我解压在E盘的computer…

行业追踪,2023-10-12

自动复盘 2023-10-12 凡所有相,皆是虚妄。若见诸相非相,即见如来。 k 线图是最好的老师,每天持续发布板块的rps排名,追踪板块,板块来开仓,板块去清仓,丢弃自以为是的想法,板块去留让…

以单颗CMOS摄像头重构三维场景,维悟光子发布单目红外3D成像模组

维悟光子近期发布全新单目红外3D成像模组,现可提供下游用户进行测试导入。通过结合微纳光学元件编码和人工智能算法解码,维悟光子单目红外3D成像模组采用单颗摄像头,通过单帧拍摄,可同时获取像素级配准的3D点云和红外图像信息,可被应用于机器人、生物识别等广阔领域。 市场…

【RKNN】YOLO V5中pytorch2onnx,pytorch和onnx模型输出不一致,精度降低

在yolo v5训练的模型,转onnx,再转rknn后,测试发现: rknn模型,量化与非量化,相较于pytorch模型,测试精度都有降低onnx模型,相较于pytorch模型,测试精度也有降低&#xff…

缓存设计的创新之旅:架构的灵魂之一

缓存在架构设计中占有重要地位。缓存在提升性能中也扮演重要的角色。常见的有对资源的缓存,比如数据库连接池、http连接池,还有对数据的缓存等。缓存的设计可复杂也可简单,但是需要考虑的点却很多。 缓存对象 设计缓存的时候一定要考虑的是&…

大语言模型之十七-QA-LoRA

由于基座模型通常需要海量的数据和算力内存,这一巨大的成本往往只有巨头公司会投入,所以一些优秀的大语言模型要么是大公司开源的,要么是背后有大公司身影公司开源的,如何从优秀的开源基座模型针对特定场景fine-tune模型具有广大的…

香港专用服务器拥有良好的国际网络连接

香港服务器在多个领域有着广泛的应用。无论是电子商务、金融交易、游戏娱乐还是社交媒体等,香港服务器都能够提供高效稳定的服务。对于跨境电商来说,搭建香港服务器可以更好地满足亚洲用户的购物需求;对于金融机构来说,香港服务器…

当涉及到API接口数据分析时,主要可以从以下几个方面展开

当涉及到API接口数据分析时,主要可以从以下几个方面展开: 请求分析:可以统计每个API接口的请求次数、请求成功率、失败率等基础指标。这些指标可以帮助你了解API接口的使用情况,比如哪个API接口被调用的次数最多,哪个…

c++-list

文章目录 前言一、list介绍及使用1、list介绍2、list使用2.1 list构造函数的使用2.2 list iterator的使用2.3 list capacity的使用2.4 list modifiers的使用2.5 list使用算法库中的find模板生成find方法2.6 list中的sort方法 二、list模拟实现1、查看list源码的大致实现思路2、…

身份证实名核验接口,身份证实名认证,身份证二要素实名认证,身份证实名校验,身份证一致性实名认证

一、接口介绍 验证身份证与姓名是否匹配,查询身份证信息。如校验通过,接口返回生日、性别、地址等信息。广泛应用于信贷、安防、银行、保险等行业及各种身份核查场景。 注意:当请求参数符合“【固定同一个参数,其余参数不同】,”…

基于VScode 使用plantUML 插件设计状态机

本文主要记录本人初次在VScode上使用PlantUML设计 本文只讲述操作的实际方法,假设java已安装成功 。 1. 在VScode下安装如下插件 2. 验证环境是否正常 新建一个文件夹并在目录下面新建文件test.plantuml 其内容如下所示: startuml hello world skinparam Style …

基于小波变换的分形信号r指数求解算法matlab仿真

目录 1.算法运行效果图预览 2.算法运行软件版本 3.部分核心程序 4.算法理论概述 5.算法完整程序工程 1.算法运行效果图预览 2.算法运行软件版本 matlab2022a 3.部分核心程序 ................................................................... %通过功率谱密度曲线…

WebSocket连接异常 Error parsing HTTP request header Connection reset by peer

问题描述 在使用spring的方式集成websocket时,在配置WebSocketConfigurer后 Configuration EnableWebSocket public class WebSocketConfiguration implements WebSocketConfigurer {ResourceServletWebSocketServerHandler servletWebSocketServerHandler;Overri…

linux总结

cat -n filename 查看文件,-n用来给每一行标行号,可以省略 cat /var/log/mysqld.log | grep password 我们可以通过上述指令,查询日志文件内容中包含password的行信息。 more 作用: 以分页的形式显示文件内容 语法: more fileName 操作说明: 回车键 …

Spring Boot 中的 Redis 数据操作配置和使用

Spring Boot 中的 Redis 数据操作配置和使用 Redis(Remote Dictionary Server)是一种高性能的开源内存数据库,用于缓存、消息队列、会话管理和数据存储。在Spring Boot应用程序中,Redis被广泛用于各种用例,包括缓存、…

【教学类-35-04】学号+姓名+班级(中3班)学号字帖(A4竖版2份 竖版长条)

图片展示: 背景需求: 2022年9-2023年1月我去过小3班带班,但是没有在这个班级投放过学具,本周五是我在本学期第一次带中3班,所以提供了一套学号描字帖。先让我把孩子的名字和脸混个眼熟。 之前试过一页两套名字的纸张切割方法有:…

distcc分布式编译

distcc https://gitee.com/bison-fork/distcc.git 下载工具链 mingw,https://www.mingw-w64.org/downloads/#w64devkitperl,https://strawberryperl.com/releases.html免安装zip版本,autoconf等脚本依赖perlautoconf、automake&#xff0c…

只有正规才有机会,CTF/AWD竞赛标准参考书来了

目录 前言 一、内容简介 二、读者对象 三、目录 前言 随着网络安全问题日益凸显,国家对网络安全人才的需求持续增长,其中,网络安全竞赛在国家以及企业的人才培养和选拔中扮演着至关重要的角色。 在数字化时代,企业为了应对日益…

Flutter:open_file打开本地文件报错问题

相关插件及版本: open_file: ^3.2.1 问题: 项目中一直用的这个插件,突然发现在安卓高版本不能正常使用,报权限问题permissionDenied,断点调试提示相关权限是MANAGE_EXTERNAL_STORAGE,申请权限之后还是不行&…

springboot集成kafka

1、引入依赖 <dependency><groupId>org.springframework.kafka</groupId><artifactId>spring-kafka</artifactId><version>2.8.6</version></dependency> 2、配置 server:port: 9099 spring:kafka:bootstrap-servers: 192.1…