一、环境
RHEL 6.5
burrow 版本1.1
LinkedIn 公司的数据基础设施Streaming SRE团队正在积极开发Burrow,该软件由Go语言编写,在Apache许可证下发布,并托管在 GitHub Burrow 上。
它收集集群消费者群组的信息,并为每个群组计算出一个单独的状态,告诉我们群组是否运行正常,是否落后,速度是否变慢或者是否已经停止工作,以此来完成对消费者状态的监控。它不需要通过监控群组的进度来获得阈值,不过用户仍然可以从中获得消息的延时数量。
kafka版本2.10
二、安装burrow
Burrow自动监控所有消费者和他们消费的每个分区。它通过消费特殊的内部Kafka主题来消费者偏移量。然后,Burrow将消费者信息作为与任何单个消费者分开的集中式服务提供。消费者状态通过评估滑动窗口中的消费者行为来确定。
这些信息被分解成每个分区的状态,然后转化为Consumer的单一状态。消费状态可以是OK,或处于WARNING状态(Consumer正在工作但消息消费落后),或处于ERROR状态(Consumer已停止消费或离线)。此状态可通过简单的HTTP请求发送至Burrow获取状态,也可以通过Burrow 定期检查并使用通知其通过电子邮件或单独的HTTP endpoint接口(例如监视或通知系统)发送出去。
Burrow能够监控Consumer消费消息的延迟,从而监控应用的健康状况,并且可以同时监控多个Kafka集群。用于获取关于Kafka集群和消费者的信息的HTTP上报服务与滞后状态分开,对于在无法运行Java Kafka客户端时有助于管理Kafka集群的应用程序非常有用。
Burrow 是基于 Go 语言开发,当前 Burrow 的 v1.1 版本已经release。
Burrow 也提供用于 docker 镜像。
安装方法可以选用源码编译,和使用官方提供的二进制包等方法。
这里推荐使用二进制包的方式。
Burrow 是无本地状态存储的,CPU密集型,网络IO密集型应用。
[root@IIMAPP02 ~]# cd /opt/
[root@IIMAPP02 opt]# mkdir burrow
[root@IIMAPP02 opt]# cd burrow/
[root@IIMAPP02burrow]#wget https://github.com/linkedin/Burrow/releases/download/v1.1.0/Burrow_1.1.0_linux_amd64.tar.gz
[root@IIMAPP02 burrow]# tar xvf Burrow_1.1.0_linux_amd64.tar.gz
[root@IIMAPP02 burrow]# ls
burrow burrow.out burrow.pid config NOTICE
Burrow_1.1.0_linux_amd64.tar.gz burrow.out.2020-04-29_10:09:38 CHANGELOG.md LICENSE README.md
[root@IIMAPP02 burrow]# cp burrow /usr/bin/
[root@IIMAPP02 burrow]# mkdir /etc/burrow/
[root@IIMAPP02 burrow]# cp config/* /etc/burrow/
修改配置文件默认的配置文件为/etc/burrow/burrow.toml
主要要修改的地方为zk地址和kafka地址 以及日志和pid路径
[root@IIMAPP02 burrow]# cat /etc/burrow/burrow.toml
[general]
pidfile="/var/run/burrow.pid"
stdout-logfile="burrow.out"
access-control-allow-origin="mysite.example.com"
[logging]
filename="/var/log/burrow.log"
level="info"
maxsize=100
maxbackups=30
maxage=10
use-localtime=false
use-compression=true
[zookeeper]
servers=[ "9.1.8.234:2181", "9.1.8.234:2182", "9.1.8.234:2183" ]
timeout=6
root-path="/burrow"
[client-profile.test]
client-id="burrow-test"
kafka-version="0.10.0"
[cluster.local]
class-name="kafka"
servers=[ "9.1.9.49:9092", "9.1.10.85:9092", "9.1.8.246:9092" ]
client-profile="test"
topic-refresh=120
offset-refresh=30
[consumer.local]
class-name="kafka"
cluster="local"
servers=[ "9.1.9.49:9092", "9.1.10.85:9092", "9.1.8.246:9092" ]
client-profile="test"
group-blacklist="^(console-consumer-|python-kafka-consumer-|quick-).*$"
group-whitelist=""
[consumer.local_zk]
class-name="kafka_zk"
cluster="local"
servers=[ "9.1.8.234:2181", "9.1.8.234:2182", "9.1.8.234:2183" ]
zookeeper-path="/kafka-cluster"
zookeeper-timeout=30
group-blacklist="^(console-consumer-|python-kafka-consumer-|quick-).*$"
group-whitelist=""
[httpserver.default]
address=":8000"
[storage.default]
class-name="inmemory"
workers=20
intervals=15
expire-group=604800
min-distance=1
#[notifier.default]
#class-name="http"
#url-open="http://someservice.example.com:1467/v1/event"
#interval=60
#timeout=5
#keepalive=30
#extras={ api_key="REDACTED", app="burrow", tier="STG", fabric="mydc" }
#template-open="conf/default-http-post.tmpl"
#template-close="conf/default-http-delete.tmpl"
#method-close="DELETE"
#send-close=true
#threshold=1
启动脚本
[root@IIMAPP02 burrow]# cat /etc/init.d/burrow_script
#!/bin/bash
#
# Comments to support chkconfig
# chkconfig: - 98 02
# description: Burrow is kafka lag check_program by LinkedIn, Inc.
#
# Source function library.
. /etc/init.d/functions
### Default variables
prog_name="burrow"
prog_path="/usr/bin/${prog_name}"
pidfile="/var/run/${prog_name}.pid"
options="-config-dir /etc/burrow/"
# Check if requirements are met
[ -x "${prog_path}" ] || exit 1
RETVAL=0
start(){
echo -n $"Starting $prog_name: "
#pidfileofproc $prog_name
#killproc $prog_path
PID=$(pidofproc -p $pidfile $prog_name)
#daemon $prog_path $options
if [ -z $PID ]; then
$prog_path $options > /dev/null 2>&1 &
[ ! -e $pidfile ] && sleep 1
fi
[ -z $PID ] && PID=$(pidof ${prog_path})
if [ -f $pidfile -a -d "/proc/$PID" ]; then
#RETVAL=$?
RETVAL=0
#[ ! -z "${PID}" ] && echo ${PID} > ${pidfile}
echo_success
[ $RETVAL -eq 0 ] && touch /var/lock/subsys/$prog_name
else
RETVAL=1
echo_failure
fi
echo
return $RETVAL
}
stop(){
echo -n $"Shutting down $prog_name: "
killproc -p ${pidfile} $prog_name
RETVAL=$?
echo
[ $RETVAL -eq 0 ] && rm -f /var/lock/subsys/$prog_name
return $RETVAL
}
restart() {
stop
start
}
case "$1" in
start)
start
;;
stop)
stop
;;
restart)
restart
;;
status)
status $prog_path
RETVAL=$?
;;
*)
echo $"Usage: $0 {start|stop|restart|status}"
RETVAL=1
esac
exit $RETVAL
chmod +x /etc/init.d/burrow_script
启动并添加开机自启动
[root@IIMAPP02 burrow]# /etc/init.d/burrow_script start
Starting burrow: [确定]
至此burrow 安装完成
测试一下看能否获取到数据
[root@IIMAPP02 zabbix_agentd.d]# curl -s http://9.1.9.49:8000/v3/kafka/local/consumer|jq .
{
"request": {
"host": "IIMAPP02",
"url": "/v3/kafka/local/consumer"
},
"consumers": [
"elasticsearch2"
],
"message": "consumer list returned",
"error": false
}
使用zabbix监控kafka 消费
1、添加zabbix自定义监控
[root@IIMAPP02 zabbix_agentd.d]# cat userparameter_kafkaconsumer.conf
UserParameter=kafkaconsumer[*],/etc/zabbix/zabbix_agentd.d/kafka_consumers.sh $1 $2 $3 $4 $5
[root@IIMAPP02 zabbix_agentd.d]# pwd
/etc/zabbix/zabbix_agentd.d
测试脚本自动发现消费者以及topic
[root@IIMAPP02 zabbix_agentd.d]# pwd
/etc/zabbix/zabbix_agentd.d
[root@IIMAPP02 zabbix_agentd.d]# ./kafka_consumers.sh discovery
{"data":[{
"{#CONSUMER}": "elasticsearch2",
"{#PARTITION}": 0,
"{#TOPIC}": "tivoli"
}]}
kafka_consumer.sh 脚本内容
[root@IIMAPP02 zabbix_agentd.d]# cat kafka_consumers.sh
#!/bin/sh
host1=`hostname -i`
#curl1="curl -s http://$host1:8000/v2/kafka/local/consumer"
curl1="curl -s http://$host1:8000/v3/kafka/local/consumer"
case $1 in
discovery)
res=(`$curl1 | jq ".consumers[]" | tr -d '"'`)
for element in ${res[@]}
do
res1=`$curl1/$element/lag | jq 'if .status.group == "'${element}'" then .status.partitions[] | {"{#CONSUMER}":"'${element}'","{#PARTITION}":.partition,"{#TOPIC}":.topic} else 1 end'| jq -s add`
if [[ "$res1" != "null" ]]; then
ress_all=$ress_all,$res1
fi
done
ress_all=${ress_all:1}
echo "{\"data\":[$ress_all]}"
;;
consumer_status)
res=`$curl1/$2/status | jq ".status.status" | tr -d '"'`
case $res in
NOTFOUND)
echo 0
;;
OK)
echo 1
;;
WARN)
echo 2
;;
ERR)
echo 3
;;
STOP)
echo 4
;;
STALL)
echo 5
;;
*)
echo $res
esac
;;
consumer_tottallag)
res=`$curl1/$2/status | jq .status.totallag`
echo $res
;;
consumer_offsets)
case $3 in
start)
res=`$curl1/$2/lag | jq '.status.partitions[]' | jq 'if .topic == "'$5'" then . else null end' | grep -v null | jq 'if .partition == '$4' then .start.offset else null end' | grep -v null`
;;
end)
res=`$curl1/$2/lag | jq '.status.partitions[]' | jq 'if .topic == "'$5'" then . else null end' | grep -v null | jq 'if .partition == '$4' then .end.offset else null end' | grep -v null`
;;
esac
echo $res
;;
consumer_lag)
case $3 in
start)
res=`$curl1/$2/lag | jq '.status.partitions[]' | jq 'if .topic == "'$5'" then . else null end' | grep -v null | jq 'if .partition == '$4' then .start.lag else null end' | grep -v null`
;;
end)
res=`$curl1/$2/lag | jq '.status.partitions[]' | jq 'if .topic == "'$5'" then . else null end' | grep -v null | jq 'if .partition == '$4' then .end.lag else null end' | grep -v null`
;;
esac
echo $res
;;
consumer_offsets_status)
res=`$curl1/$2/lag | jq '.status.partitions[]' | jq 'if .topic == "'$4'" then . else null end' | grep -v null | jq 'if .partition == '$3' then .status else null end' | grep -v null | tr -d '"'`
case $res in
NOTFOUND)
echo 0
;;
OK)
echo 1
;;
WARN)
echo 2
;;
ERR)
echo 3
;;
STOP)
echo 4
;;
STALL)
echo 5
;;
*)
echo $res
;;
esac
;;
*)
echo "Try
./kafka_consumers.sh discovery - use for discovery consumers and partitions
./kafka_consumers.sh consumer_status {#CONSUMER}
./kafka_consumers.sh consumer_tottallag {#CONSUMER}
./kafka_consumers.sh consumer_offsets {#CONSUMER} start|end {#PARTITION} {#TOPIC}
./kafka_consumers.sh consumer_lag {#CONSUMER} start|end {#PARTITION} {#TOPIC}
"
;;
esac
导入模板zbx_templates_kafkaconsumers.xml 和值映射模板 zbx_valuemaps_kafkaconsumers.xml
使用zabbix测试一下
root@ *zabbix-master* @yxsjfxapp02:/root# zabbix_get -I 9.1.8.198 -s 9.1.9.49 -p 10050 -k "kafkaconsumer[discovery]"
{"data":[{
"{#CONSUMER}": "elasticsearch2",
"{#PARTITION}": 0,
"{#TOPIC}": "tivoli"
}]}
然后关联kafka模板到要监控的主机
github.com/helli0n/kafka-monitoring
https://blog.51cto.com/professor/2119071
来源:oschina
链接:https://my.oschina.net/kcw/blog/4258005