【推荐】2019 Java 开发者跳槽指南.pdf(吐血整理) >>>
在日常监控中对于服务的健康监测是非常重要的,其监控手段大多是以 监控服务端口和服务进程,通常只需要监控其中一个即可。
blackbox_exporter是Prometheus 官方提供的 exporter 之一,可以提供 http、dns、tcp、icmp 的监控数据采集
Blackbox_exporter 应用场景
HTTP 测试
定义 Request Header 信息
判断 Http status / Http Respones Header / Http Body 内容
TCP 测试
业务组件端口状态监听
应用层协议定义与监听
ICMP 测试
主机探活机制
POST 测试
接口联通性
SSL 证书过期时间
1 安装blackbox_exporter
cd /usr/local
wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.16.0/blackbox_exporter-0.16.0.linux-amd64.tar.gz
tar xf blackbox_exporter-0.16.0.linux-amd64.tar.gz
mv blackbox_exporter-0.16.0.linux-amd64 blackbox_exporter
2 将blackbox_exporter注册为服务
$ cat /etc/systemd/system/blackbox_exporter.service
[Service]
Restart=on-failure
WorkingDirectory=/usr/local/blackbox_exporter/
ExecStart=/usr/local/blackbox_exporter/blackbox_exporter --config.file=/usr/local/blackbox_exporter/blackbox.yml
[Install]
WantedBy=multi-user.target
3 启动服务
systemctl start blackbox_exporter
sudo systemctl status blackbox_exporter
sudo systemctl enable blackbox_exporter
服务端口为 9115
4 监控服务端口示例
cat /usr/local/prometheus/prometheus.yml
- job_name: 'check_nginx_port_status'
metrics_path: /probe
params:
module: [tcp_connect]
static_configs:
- targets: ['10.0.2.100:80']
labels:
instance: 'port_status'
app: 'tcp_80'
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 10.0.2.100:9115
5 监控域名示例
cat /usr/local/prometheus/prometheus.yml
- job_name: 'check_domain_status'
metrics_path: /probe
params:
module: [http_2xx]
static_configs:
- targets: ['https://test.com']
labels:
instance: 'domain_status'
app: 'web_domain'
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 10.0.2.100:9115
6 ping检测
cat /usr/local/prometheus/prometheus.yml
- job_name: 'check_domain_status'
metrics_path: /probe
params:
module: [icmp]
static_configs:
- targets: ['10.0.2.100']
labels:
instance: 'node_status'
app: 'node'
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- target_label: __address__
replacement: 10.0.2.100:9115
7 今天主要测试端口检测
配置prometheus
cat /usr/local/prometheus/prometheus.yml
- job_name: 'check_nginx_port_status'
metrics_path: /probe
params:
module: [tcp_connect]
file_sd_configs:
- files: ['/usr/local/prometheus/conf.d/check80.json']
refresh_interval: 60s
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 10.0.2.100:9115
采用动态文件注册方式
$ cat /usr/local/prometheus/conf.d/check80.json
[
{
"targets":[
"10.0.2.101:80"
],
"labels": {
"hostname":"front-01",
"app": "tcp_80"
}
},
{
"targets":[
"10.0.2.102:80"
],
"labels": {
"hostname":"front-02",
"app": "tcp_80"
}
}
]
配置告警规则
$ cat /usr/local/prometheus/rules/nginxservice_rules.yml
groups:
- name: nginxservices
rules:
- alert: 80_port is down
expr: probe_success{job=~"check_nginx_port_status"} == 0
for: 1m
labels:
severity: 3
annotations:
summery: "当前值为: {{ $value }}"
console: '主机 {{ $labels.hostname }}, nginx 服务器挂了!'
重启prometheus
systemctl restart prometheus
此时我们配置了被检测的端口与服务器,并且配置了告警规则。
同时grafana也有相应模板 9965
来源:oschina
链接:https://my.oschina.net/54188zz/blog/3147626