前言 在一个监控系统中,如果说数据链路是她的骨架,那么告警通知服务就是他的灵魂!所有的监控服务都是为了能够及时通知出来,减少人工查询状态,及时发现问题,避免不必要的大规模故障,为企业政府省钱,和保证安全而存在的。 所以能发现问题很重要,更重要的是发现问题赶快让人知道,这就是今天要说的,告警通知服务。 一个开源项目PrometheusAlert 这个项目可以给 很多第三方服务对接 ,进行电话 、短信 等告警方式 ,也是我们要用到的 ,先部署起来。 github位置 部署方式参考项目中 的 部署方式 那一节,要注意的是 ,他的配置文件必须在二进制文件的当前目录, conf/app.conf 叫这个名字才会读取。 原因是用到 beego 框架 ,默认读取这个位置的配置文件,如果没有符合的二进制文件,可以自己编译。 GOPATH=xxxx/monitor_alert CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -o xxx/monitor_alert/bin/PrometheusAlertLinuxAmd64 xxx/monitor_alert/src/PrometheusAlert/PrometheusAlert.go GOPATH=xxxx/monitor_alert CGO_ENABLED=0 GOOS=linux


《万亿级数据库MongoDB集群性能数十倍提升优化实践》核心17问详细解答 说明: 为了更好的理解背景,请提前阅读oschina分享的 《万亿级数据库MongoDB集群性能数十倍提升及机房多活容灾实践》 一文。 本文是2020年深圳Qcon全球软件开发大会 《专题:现代数据架构》 专场 、 dbaplus专场:万亿级数据库MongoDB集群性能优化实践 、mongodb2020年终盛会 分享 后,获得一致好评。本文收集了会后众多mongodb用户提的比较频繁的17个问题,并对每个问题进行了详细解答,一并整理到本文中。 分享内容回顾如下: MongoDB在OPPO互联网推广经验分享-如何把一个淘汰边缘的数据库逐步变为公司主流数据库 谈谈当前国内对MongoDB误解(丢数据、不安全、难维护)? MongoDB跨机房多活方案-实现成本、性能、一致性"三丰收" MongoDB线程模型瓶颈及其优化方法 并行迁移:MongoDB内核扩容迁移速率数倍/数十倍提升优化实践 百万级高并发读写/千亿级数据量MongoDB集群性能数倍提升优化实践 万亿级数据量MongoDB集群性能数十倍提升优化实践 磁盘80%节省-记某服务接口千亿级数据迁移MongoDB,近百台SSD服务器节省原理 关于作者 前滴滴出行技术专家,现任OPPO文档数据库mongodb负责人

Configure basic_auth for Prometheus Target

问题 One of the targets in static_configs in my prometheus.yml config file is secured with basic authentication. As a result, an error of description "Connection refused" is always displayed against that target in the Prometheus Targets' page. I have researched how to setup prometheus to provide the security credentials when trying to scrape that particular target but couldn't find any solution. What I found was how to set it up on the scrape_config section in the docs. This won't work for me

Prometheus alert manager doesnt send alert k8s

问题 Im using prometheus operator 0.3.4 and alert manager 0.20 and it doesnt work, i.e. I see that the alert is fired (on prometheus UI on the alerts tab) but I didnt get any alert to the email. by looking at the logs I see the following , any idea ? please see the warn in bold maybe this is the reason but not sure how to fix it... This is the helm of prometheus operator which I use: level=info ts=2019-12-23T15:42:28.039Z caller

Understanding histogram_quantile based on rate in Prometheus

问题 According to Prometheus documentation in order to have a 95th percentile using histogram metric I can use following query: histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le)) Source: Since each bucket of histogram is a counter we can calculate rate each of the buckets as: per-second average rate of increase of the time series in the range vector. See:

How to get the quantile of rate in prometheus

问题 I am looking at this article # TYPE prometheus_http_request_duration_seconds histogram prometheus_http_request_duration_seconds_bucket{handler="/",le="0.1"} 25547 prometheus_http_request_duration_seconds_bucket{handler="/",le="0.2"} 26688 prometheus_http_request_duration_seconds_bucket{handler="/",le="0.4"} 27760 prometheus_http_request_duration_seconds_bucket{handler="/",le="1"} 28641 prometheus_http_request_duration_seconds_bucket{handler="/",le="3"} 28782 I am confused on why histogram

初试 Open Service Mesh(OSM)

微软近期开源了一个新的名为 Open Service Mesh [1] 的项目并准备 捐赠给 CNCF [2] 。 基本介绍  Open Service Mesh (OSM) is a lightweight, extensible, Cloud Native service mesh that allows users to uniformly manage, secure, and get out-of-the-box observability features for highly dynamic microservice environments. ” Open Service Mesh(OSM)是一个轻量级,可扩展的云原生服务网格,它使用户能够统一管理,保护和获得针对高度动态微服务环境的开箱即用的可观察性功能。 OSM 在 Kubernetes 上运行基于 Envoy 的控制平面,可以使用 SMI API 进行配置。它通过以 sidecar 的形式注入 Envoy 代理来工作。 控制面负责持续配置代理,以配置策略和路由规则等都保持最新。代理主要负责执行访问控制的规则,路由控制,采集 metrics 等。(这和目前我们常见到的 Service Mesh 方案基本都一样的) 显著特性 基于 Service Mesh Interface (SMI) 的实现,主要包括

Alertmanager 安装(k8s报警)

一、下载Alertmanager wget #解压 tar xf alertmanager-0.16.0-alpha.0.linux-amd64.tar.gz mv alertmanager-0.16.0-alpha.0.linux-amd64 /usr/local/alertmanager #创建数据目录 mkdir -p /data/alertmanager #创建用户 useradd prometheus chown -R prometheus:prometheus /usr/local/alertmanager /data/alertmanager/ #添加启动服务 vim /usr/lib/systemd/system/alertmanager.service [Unit] Description=Alertmanager [Service] Type=simple User=prometheus ExecStart=

Dynamically update prometheus scrape config based on pod labels

问题 I'm trying to enhance my monitoring and want to expand the amount of metrics pulled into Prometheus from our Kube estate. We already have a stand alone Prom implementation which has a hard coded config file monitoring some bare metal servers, and hooks into cadvisor for generic Pod metrics. What i would like to do is configure Kube to monitor the apache_exporter metrics from a webserver deployed in the cluster, but also dynamically add a 2nd, 3rd etc webserver as the instances are scaled up.