1、拉取镜像并运行
1、配置docker镜像源
[root@localhost ~]# vim /etc/docker/daemon.json
{"registry-mirrors": ["https://dfaad.mirror.aliyuncs.com"]
}
[root@localhost ~]# systemctl daemon-reload
[root@localhost ~]# systemctl restart docker2、拉取alertmanager镜像并运行
[root@localhost ~]# docker run -d --name test -p 9093:9093 prom/alertmanager3、创建目录并复制配置文件到本地
[root@localhost ~]# mkdir /alertmager
[root@localhost ~]# docker cp test:/etc/alertmanager/alertmanager.yml /alertmager/
Successfully copied 2.05kB to /alertmager/
[root@localhost ~]# cd /alertmager/
[root@localhost ~]# cp alertmanager.yml alertmanager.yml.bak
这里 AlertManager
默认启动的端口为 9093,启动完成后,浏览器访问http://:9093 可以看到默认提供的 UI 页面,不过现在是没有任何告警信息的,因为我们还没有配置报警规则来触发报警
2、配置alertmanager告警
[root@localhost alertmager]# vim alertmanager.yml #初始文件内容
route:group_by: ['alertname']group_wait: 30sgroup_interval: 5mrepeat_interval: 1hreceiver: 'web.hook'
receivers:
- name: 'web.hook'webhook_configs:- url: 'http://127.0.0.1:5001/'
inhibit_rules:- source_match:severity: 'critical'target_match:severity: 'warning'equal: ['alertname', 'dev', 'instance']#######
主要配置的作用:global: 全局配置,包括报警解决后的超时时间、SMTP 相关配置、各种渠道通知的 API 地址等等。route: 用来设置报警的分发策略,它是一个树状结构,按照深度优先从左向右的顺序进行匹配。receivers: 配置告警消息接受者信息,例如常用的 email、wechat、slack、webhook 等消息通知方式。inhibit_rules: 抑制规则配置,当存在与另一组匹配的警报(源)时,抑制规则将禁用与一组匹配的报警(目标)。修改文件内容如下:
[root@localhost alertmager]# vim alertmanager.yml
global:resolve_timeout: 5msmtp_from: 'xxx.com' #定义发送的邮箱smtp_smarthost: 'smtp.exmail.qq.com:465'smtp_auth_username: 'xxx.com'smtp_auth_password: 'xxx' ##此处的密码需要去网页版邮箱里申请,参考下图smtp_require_tls: falsesmtp_hello: 'qq.com'
route:group_by: ['alertname']group_wait: 5sgroup_interval: 5srepeat_interval: 5mreceiver: 'email'
receivers:
- name: 'email'email_configs:- to: 'xxx.com' #定义接收的邮箱send_resolved: true
inhibit_rules:- source_match:severity: 'critical'target_match:severity: 'warning'equal: ['alertname', 'dev', 'instance']重启alertmanager
[root@localhost alertmager]# docker rm -f test
test
docker run -d --name alertmanager -p 9093:9093 -v /alertmanager/alertmanager.yml:/etc/alertmanager/alertmanager.yml prom/alertmanager
dd03cbca4c9e101333c86ef19f34226755b3eecbbced1dee5163a268997796c4
[root@localhost /]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
dd03cbca4c9e prom/alertmanager "/bin/alertmanager -…" 47 seconds ago Up 47 seconds 0.0.0.0:9093->9093/tcp, :::9093->9093/tcp alertmanager
9eae5f121ddd prom/prometheus "/bin/prometheus --c…" 7 days ago Up 42 minutes 0.0.0.0:9090->9090/tcp, :::9090->9090/tcp prometheus
2054c56d6cdc google/cadvisor "/usr/bin/cadvisor -…" 3 months ago Up 40 minutes 0.0.0.0:8080->8080/tcp, :::8080->8080/tcp cadvisor
c11589f8d3a4 prom/node-exporter "/bin/node_exporter" 3 months ago Up 40 minutes reverent_moser
e7181b2d397a grafana/grafana "/run.sh" 3 months ago Up 40 minutes 0.0.0.0:3000->3000/tcp, :::3000->3000/tcp grafana
申请邮箱授权码作为密码:
将此密码填入上述alertmanager配置文件中
3、Prometheus添加alertmanager告警规则
1、新建告警规则文件
[root@localhost /]# cd /opt/prometheus/
[root@localhost prometheus]# mkdir rules
[root@localhost prometheus]# cd rules/
[root@localhost rules]# vim whether-up.rules
mkdir /prometheus/rules
cd /prometheus/rules
vim node-up.rules
groups:
- name: node-uprules:- alert: node-upexpr: up{job="jumpserver"} == 0 #job的名称即在Prometheus.yml里设置的job_namefor: 15slabels:severity: 1team: nodeannotations:summary: "{{ $labels.instance }} 已停止运行超过 15s!"2、修改prometheus.yml,添加rules规则和alertmanager地址,端口
在最下方添加如下:
alerting:alertmanagers:- static_configs:- targets:- 10.10.80.167:9093rule_files:- "/usr/local/prometheus/rules/*.rules"这里 rule_files 为容器内路径,需要将本地whether-up.rules文件挂载到容器内指定路径,修改 Prometheus启动命令如下,并重启服务:[root@localhost prometheus]# docker rm -f prometheus
[root@localhost prometheus]# docker run -d --name prometheus -p 9090:9090 --restart=always -v /prometheus/prometheus.yml:/etc/prometheus/prometheus.yml -v /prometheus/rules:/usr/local/prometheus/rules prom/prometheus
[root@localhost prometheus]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
53c62707c219 prom/prometheus "/bin/prometheus --c…" 2 seconds ago Up 1 second 0.0.0.0:9090->9090/tcp, :::9090->9090/tcp prometheus
dd03cbca4c9e prom/alertmanager "/bin/alertmanager -…" 2 hours ago Up 2 hours 0.0.0.0:9093->9093/tcp, :::9093->9093/tcp alertmanager
2054c56d6cdc google/cadvisor "/usr/bin/cadvisor -…" 3 months ago Up 2 hours 0.0.0.0:8080->8080/tcp, :::8080->8080/tcp cadvisor
c11589f8d3a4 prom/node-exporter "/bin/node_exporter" 3 months ago Up 2 hours reverent_moser
e7181b2d397a grafana/grafana "/run.sh" 3 months ago Up 2 hours 0.0.0.0:3000->3000/tcp, :::3000->3000/tcp grafana
在prometheus上查看告警规则:
测试告警是否生效:
由于上面配了job_name为jumpserver,登陆此job所属的机器,停掉上面的docker,测试能否告警
如下:
[root@jumpserver ~]# docker stop 4e5797ec1ed0
之后查看prometheus网页,可看到容器已停止:
邮件告警也已收到,内容如下:
再次启动该job的容器,可收到告警恢复的邮件: