基于云上Prometheus对cassandra监控

本文详细介绍了如何通过jmx_exporter在Cassandra中设置监控,配置jmxjavaagent和metrics文件,以及如何在Prometheus和Grafana中集成Cassandra数据,包括预警规则实例。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

1、概述

针对cassandra的监控,通过jmx_exporter方式暴露数据采集端口,然后通过prometheus定时数据采集,从而实现监控。

2、Agent配置

所有安装cassandra节点都需要进行相同配置;

2.1、下载jmx javaagent

从github下载jmx_prometheus_javaagent-0.16.1.jar安装包,并将
其上传到cassandra集群$CASSANDRA_HOME/lib/目录下。

2.2、配置metrics采集文件

配置文件内容来源于Cassandra dashboard,可以从https://grafana.com/grafana/dashboards 搜索下载5408编号。分别进入cassandra集群节点 conf/ 目录,增加如下配置文件。

vi cassandra-prometheus-jmx.yml
#并将下面内容拷贝到文件内容
lowercaseOutputName: true
lowercaseOutputLabelNames: true
whitelistObjectNames: [
"org.apache.cassandra.metrics:type=ColumnFamily,name=RangeLatency,*",
"org.apache.cassandra.metrics:type=ColumnFamily,name=LiveSSTableCount,*",
"org.apache.cassandra.metrics:type=ColumnFamily,name=SSTablesPerReadHistogram,*",
"org.apache.cassandra.metrics:type=ColumnFamily,name=SpeculativeRetries,*",
"org.apache.cassandra.metrics:type=ColumnFamily,name=MemtableOnHeapSize,*",
"org.apache.cassandra.metrics:type=ColumnFamily,name=MemtableSwitchCount,*",
"org.apache.cassandra.metrics:type=ColumnFamily,name=MemtableLiveDataSize,*",
"org.apache.cassandra.metrics:type=ColumnFamily,name=MemtableColumnsCount,*",
"org.apache.cassandra.metrics:type=ColumnFamily,name=MemtableOffHeapSize,*",
"org.apache.cassandra.metrics:type=ColumnFamily,name=BloomFilterFalsePositives,*",
"org.apache.cassandra.metrics:type=ColumnFamily,name=BloomFilterFalseRatio,*",
"org.apache.cassandra.metrics:type=ColumnFamily,name=BloomFilterDiskSpaceUsed,*",
"org.apache.cassandra.metrics:type=ColumnFamily,name=BloomFilterOffHeapMemoryUsed,*",
"org.apache.cassandra.metrics:type=ColumnFamily,name=SnapshotsSize,*",
"org.apache.cassandra.metrics:type=ColumnFamily,name=TotalDiskSpaceUsed,*",
"org.apache.cassandra.metrics:type=CQL,name=RegularStatementsExecuted,*",
"org.apache.cassandra.metrics:type=CQL,name=PreparedStatementsExecuted,*",
"org.apache.cassandra.metrics:type=Compaction,name=PendingTasks,*",
"org.apache.cassandra.metrics:type=Compaction,name=CompletedTasks,*",
"org.apache.cassandra.metrics:type=Compaction,name=BytesCompacted,*",
"org.apache.cassandra.metrics:type=Compaction,name=TotalCompactionsCompleted,*",
"org.apache.cassandra.metrics:type=ClientRequest,name=Latency,*",
"org.apache.cassandra.metrics:type=ClientRequest,name=Unavailables,*",
"org.apache.cassandra.metrics:type=ClientRequest,name=Timeouts,*",
"org.apache.cassandra.metrics:type=Storage,name=Exceptions,*",
"org.apache.cassandra.metrics:type=Storage,name=TotalHints,*",
"org.apache.cassandra.metrics:type=Storage,name=TotalHintsInProgress,*",
"org.apache.cassandra.metrics:type=Storage,name=Load,*",
"org.apache.cassandra.metrics:type=Connection,name=TotalTimeouts,*",
"org.apache.cassandra.metrics:type=ThreadPools,name=CompletedTasks,*",
"org.apache.cassandra.metrics:type=ThreadPools,name=PendingTasks,*",
"org.apache.cassandra.metrics:type=ThreadPools,name=ActiveTasks,*",
"org.apache.cassandra.metrics:type=ThreadPools,name=TotalBlockedTasks,*",
"org.apache.cassandra.metrics:type=ThreadPools,name=CurrentlyBlockedTasks,*",
"org.apache.cassandra.metrics:type=DroppedMessage,name=Dropped,*",
"org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=HitRate,*",
"org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=Hits,*",
"org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=Requests,*",
"org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=Entries,*",
"org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=Size,*",
#"org.apache.cassandra.metrics:type=Streaming,name=TotalIncomingBytes,*",
#"org.apache.cassandra.metrics:type=Streaming,name=TotalOutgoingBytes,*",
"org.apache.cassandra.metrics:type=Client,name=connectedNativeClients,*",
"org.apache.cassandra.metrics:type=Client,name=connectedThriftClients,*",
"org.apache.cassandra.metrics:type=Table,name=WriteLatency,*",
"org.apache.cassandra.metrics:type=Table,name=ReadLatency,*",
"org.apache.cassandra.net:type=FailureDetector,*",
]
#blacklistObjectNames: ["org.apache.cassandra.metrics:type=ColumnFamily,*"]
rules:
  - pattern: org.apache.cassandra.metrics<type=(Connection|Streaming), scope=(\S*), name=(\S*)><>(Count|Value)
    name: cassandra_$1_$3
    labels:
      address: "$2"
  - pattern: org.apache.cassandra.metrics<type=(ColumnFamily), name=(RangeLatency)><>(Mean)
    name: cassandra_$1_$2_$3
  - pattern: org.apache.cassandra.net<type=(FailureDetector)><>(DownEndpointCount)
    name: cassandra_$1_$2
  - pattern: org.apache.cassandra.metrics<type=(Keyspace), keyspace=(\S*), name=(\S*)><>(Count|Mean|95thPercentile)
    name: cassandra_$1_$3_$4
    labels:
      "$1": "$2"
  - pattern: org.apache.cassandra.metrics<type=(Table), keyspace=(\S*), scope=(\S*), name=(\S*)><>(Count|Mean|95thPercentile)
    name: cassandra_$1_$4_$5
    labels:
      "keyspace": "$2"
      "table": "$3"
  - pattern: org.apache.cassandra.metrics<type=(ClientRequest), scope=(\S*), name=(\S*)><>(Count|Mean|95thPercentile)
    name: cassandra_$1_$3_$4
    labels:
      "type": "$2"
  - pattern: org.apache.cassandra.metrics<type=(\S*)(?:, ((?!scope)\S*)=(\S*))?(?:, scope=(\S*))?,
      name=(\S*)><>(Count|Value)
    name: cassandra_$1_$5
    labels:
      "$1": "$4"
      "$2": "$3"

2.3、配置cassandra-env.sh

进入cassandra各个节点的 config/ 目录,修改 conf/cassandra-env.sh文件内容,修改内容如下:

#7070为数据采集端口给prometheus
JVM_OPTS="$JVM_OPTS -javaagent:$CASSANDRA_HOME/lib/jamm-0.3.0.jar -javaagent:$CASSANDRA_HOME/lib/jmx_prometheus_javaagent-0.16.0.jar=7070:${CASSANDRA_HOME}/conf/cassandra-prometheus-jmx.yml"

在这里插入图片描述

3、Prometheus配置

3.1、Prometheus配置

修改prometheus组件的prometheus.yml配置,加入cassandra监控job:

#名称定义,必须以cassandra_开头
- job_name: 'cassandra_yd01'  
  scrape_interval: 60s
  static_configs:
  - targets: ['92.168.0.1:7070','92.168.0.2:7070','92.168.0.3:7070'] 

3.2、Prometheus启动验证

参考文章kubernetes配置:

4、Grafana配置

4.1、导入仪表盘模板

从 https://grafana.com/dashboards/5408下载仪表盘,导入到grafana,再结合自身业务修改最终仪表盘,有时候每个仪表盘都得手动Edit选择数据源“prometheus"。
在这里插入图片描述
需要注意下,grafana的cassandra metric dashboard的json有一些不正确的地方,需要人为修改下。

4.2、预警指标

常规预警预警规则
内存预警内存使用达到阈值【>80%】时,进行预警
GC耗时预警当GC耗时达到阈值【>0.5s】时,进行预警
GC次数预警当每秒GC次数达到阈值【>5】时,进行预警
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

琉璃梦境

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值