prometheus配置
#创建目录
mkdir -p /usr/local/soft/docker/prometheus
cd /usr/local/soft/docker/prometheus
#创建配置
vi prometheus.yml
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['127.0.0.1:9091']
- job_name: 'node'
static_configs:
- targets: ['ip:9100']
alerting:
alertmanagers:
- static_configs:
- targets:
- ip:9093
rule_files:
- /etc/prometheus/rule/*.yml
启动prometheus
docker run -d -p 9091:9091 --name prom -v /usr/local/soft/docker/prometheus/:/etc/prometheus/ prom/prometheus --config.file=/etc/prometheus/prometheus.yml --web.enable-lifecycle --web.listen-address="0.0.0.0:9091"
访问http://ip:9091可以打开页面
alertmanager
mkdir -p /usr/local/soft/docker/alertmanager/
cd /usr/local/soft/docker/alertmanager/
vi alertmanager.yml
# 全局配置,全局配置,包括报警解决后的超时时间、SMTP 相关配置、各种渠道通知的 API 地址等等。
global:
# 告警超时时间
resolve_timeout: 5m
# 路由配置,设置报警的分发策略,它是一个树状结构,按照深度优先从左向右的顺序进行匹配。
route:
# 用于将传入警报分组在一起的标签。
# 基于告警中包含的标签,如果满足group_by中定义标签名称,那么这些告警将会合并为一个通知发送给接收器。
group_by: ['alertname']
# 发送通知的初始等待时间
group_wait: 30s
# 在发送有关新警报的通知之前需要等待多长时间
group_interval: 5m
# 如果已发送通知,则在再次发送通知之前要等待多长时间,通常约3小时或更长时间
repeat_interval: 30s
# 接受者名称
receiver: 'web.hook'
# 配置告警消息接受者信息,例如常用的 email、wechat、slack、webhook 等消息通知方式
receivers:
# 接受者名称
- name: 'web.hook'
# webhook URL
webhook_configs:
- url: 'http://ip:9111/alertmanager/hook'
启动
docker run --name alertmanager -d -p 9093:9093 \
-v /usr/local/soft/docker/alertmanager/alertmanager.yml:/etc/alertmanager/alertmanager.yml \
prom/alertmanager:v0.25.0
创建rule
mkdir -p /usr/local/soft/docker/prometheus/rule
vi node_up.yml
groups:
- name: node-up
rules:
- alert: node-up
expr: up{job="node"} == 0
for: 10s
labels:
severity: 1
team: node
annotations:
#summary: "{{ $labels.instance }} 已停止运行超过 15s"
#description: hello world
content: "hello world"
webhook简单测试
from flask import Flask, request
import json
app = Flask(__name__)
@app.route('/alertmanager/hook', methods=['POST'])
def webhook():
data = request.get_json()
result = json.dumps(data)
print("Received alert:")
print(result)
return 'OK'
if __name__ == '__main__':
app.run(debug=True,host='0.0.0.0', port=9111)
python 启动后,宕掉node_exporter进程,等待触发告警,根据配置alertmanger将告警转发至“- url: ‘http://ip:9111/alertmanager/hook’”
发送告警至企业微信
待补充