Prometheus—监控Supervisor

Author: 博同学

参考:
https://gitee.com/slcnx/generic_architecture/blob/master/playbooks/prometheus/01-deploy-node-exporter.yml

成果图,此处报表基于大佬的模板 稍作修改,加了个Row放在node-exporter里表里了在这里插入图片描述在这里插入图片描述

1、在原有node-exporter 添加启动参数

https://gitee.com/slcnx/generic_architecture/blob/master/playbooks/prometheus/01-deploy-node-exporter.yml

vim /usr/lib/systemd/system/node-exporter.service

ExecStart=/opt/node-exporter --web.listen-address=:9100 --collector.supervisord --collector.supervisord.url=unix:///var/run/supervisor/supervisor.sock

systemctl daemon-reload
systemctl restart node-exporter

2、验证supervisor是否监控

curl -s localhost:9100/metrics | grep super | head

3、告警规则

groups:
- name: supervisor_alerts
  rules:
  - alert: SupervisordTooManyRestarts
    expr: changes(node_supervisord_start_time_seconds[15m])  > 2
    for: 0m
    labels:
      severity: critical
    annotations:
      summary: supervisord too many restarts (实例 {{ $labels.instance }})
      description: "supervisord has restarted more than twice in the last 15 minutes. It might be crashlooping.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

  - alert: supervisordDown
    expr:  node_supervisord_up == 0
    for: 0m
    labels:
      severity: critical
    annotations:
      summary: supervisord task down (实例 {{ $labels.instance }} , 任务 {{ $labels.name}})
      description: "supervisord task is down on {{ $labels.instance }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"


  - alert: supervisordError
    expr: node_supervisord_exit_status !=0
    for: 0m
    labels:
      severity: critical
    annotations:
      summary: supervisord task 失败 (实例 {{ $labels.instance }} , 任务 {{ $labels.name}})
      description: "任务退出码为 {{ $value }}"

4、报表

https://gitee.com/slcnx/generic_architecture#supervisor-%E9%85%8D%E7%BD%AE%E5%8F%8A%E7%9B%91%E6%8E%A7

groups:
- name: supervisor_alerts
  rules:
  - alert: SupervisordTooManyRestarts
    expr: changes(node_supervisord_start_time_seconds[15m])  > 2
    for: 0m
    labels:
      severity: critical
    annotations:
      summary: supervisord too many restarts (实例 {{ $labels.instance }})
      description: "supervisord has restarted more than twice in the last 15 minutes. It might be crashlooping.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

  - alert: supervisordDown
    expr:  node_supervisord_up == 0
    for: 0m
    labels:
      severity: critical
    annotations:
      summary: supervisord task down (实例 {{ $labels.instance }} , 任务 {{ $labels.name}})
      description: "supervisord task is down on {{ $labels.instance }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"


  - alert: supervisordError
    expr: node_supervisord_exit_status !=0
    for: 0m
    labels:
      severity: critical
    annotations:
      summary: supervisord task 失败 (实例 {{ $labels.instance }} , 任务 {{ $labels.name}})
      description: "任务退出码为 {{ $value }}"

5、告警结果

在这里插入图片描述

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值