Prometheus

Prometheus核心概念:一图了解瞬时向量Instant vector和区间向量Range vector的区别

∥☆過路亽.° 提交于 2021-02-01 10:32:15
1 背景 我们在查询Prometheus的时候,通常有两种方式,一种是查瞬时的Metric采样数据,一种是查一段时间范围内的Metric采样数据。 如果对这两种查询方式理解不到位,结果往往是对PromQL的一些内置函数的使用是错误的,或者查询的结果并不是自己预期的那样。 那都是查Metric采样数据,查询瞬时和查询一段时间范围内这两种方式有什么区别呢? 2 图解Metric和采样 Prometheus和Exporter的关系 在上一篇文章 Prometheus源码分析:基于Go Client自定义的Exporter,是如何在Local存储Metrics的? 中,我们介绍了Exporter是如何在Local存储Metric的。 本质是将Metric放在本地的Map中,然后等待Prometheus服务端来周期性地Pull。 3 从Prometheus服务端的视角来看Metric采样 Prometheus对target的Metric进行采样 Prometheus会周期性的对Exporter的target进行PULL。 例如:在时间T1,Prometheus访问target,采样到的Metric信息是:Metric01=Vt1 例如:在时间T2,Prometheus访问target,采样到的Metric信息是:Metric01=Vt2 Metric02=Vt2 上述示例说明

在复杂的云中实现可观测性的五个技巧

China☆狼群 提交于 2021-01-30 13:43:16
2020年,IT运维中的可观测性概念得到了人们的认同,IT领导者正在寻找新的方法来控制随着云计算和快速数字化而有机增长的复杂性。 可观测性与IT监控的不同之处在于,它关注于应用程序和丰富仪表的开发,以便运维人员可以就软件在生产中如何工作提出有意义的问题。提出新问题的能力使IT部门能够从不同角度了解应用程序的行为,从而进行优化和改进。 另一种思考可观测性的方式是,它完全是关于用户视角的,这需要以用户为中心的思维方式和方法。虽然传统的(黑匣子)监控提供了指示系统是否已启动和运行的指标,而可观测性通过显示系统是否真的能够满足业务和用户需求,进一步说明了这一点。 可观测性的作用 可观测性通过解决以下问题,与基础设施监控的业务价值建立更紧密的联系 ——服务器在线且可用,但其支持的应用程序出现故障 ——网络已启动,但用户的交易可能无法进行,或者网站行为异常 ——你的站点在一个浏览器中运行良好,但在另一个浏览器中运行不正常 在用户开始抱怨或离开你的网站/应用程序以获得更好的服务之前,IT组织需要立即了解这些问题。这对用户留存和员工来说都是可怕的,它可能会导致成本高昂、不安全的影子IT。 无论哪种方式,缺乏可观测性意味着你的组织很容易用户满意度低和支持成本高。可观测性需要一种现代的监控方法,当开发人员接受并参与监控活动时,它会更成功。 以下是一些加强可观测性的建议: 扩展数据

云原生|我对云原生软件架构的观察与思考

夙愿已清 提交于 2021-01-30 03:11:53
作者 | 易立,阿里云资深技术专家,容器技术负责人 本系列文章: 第一篇 - 云原生基础设施 (已发布,文末点击阅读原文查看) 第二篇 - 云原生软件架构(本文) 第三篇 - 云原生应用交付与运维体系(待续) 前言 在《云原生基础设施》一文中我们谈到了,云原生计算包含三个维度的内容,云原生基础设施,软件架构和交付与运维体系,本文将聚焦于软件架构层面。 “Software architecture refers to the fundamental structures of a software system and the discipline of creating such structures and systems. ” - 维基百科。 在我的理解,软件架构主要目标是解决下列挑战: 控制复杂性。 由于业务的复杂性,需要我们用更好的手段帮助研发组织克服认知障碍,更好的分工协作。分而治之,关注点分离等手段皆是如此。 应对不确定性。 业务在快速发展,需求在不断变化。即使再完美的软件架构,然而随着时间的推移,团队的变化,软件架构的调整不可避免。读《设计模式》,《微服务设计》等书字里行间写的都是“解耦”两字,让我们关注架构中确定性和不确定性的分离,提升架构的稳定性和应变能力。 管理系统性风险。 管理系统中的确定性以及不确定性风险,规避已知陷阱,对未知的风险做好准备。

Prometheus get count of up metric 0 for give alert

核能气质少年 提交于 2021-01-29 06:02:51
问题 I have alerts setup on Prometheus where there are different jobs in the alert.I want to find how many times the alert was fired over last week, given the job name.So there is a alerts name "A" and there are multiple jobs "B","C","D" under that, I want to know how many times alert "A" was fired for job "B" in last week. If I use following expression : sum by(alertname) (changes(ALERTS_FOR_STATE[1w])) It gives me total alerts fired in last week but since there are multiple jobs in that, I am

Missing Confluent Kafka Connect Metrics using Jmx Exporter for Prometheus

北慕城南 提交于 2021-01-29 05:29:47
问题 I am not able to export "type=connector-metrics" metrics for Confluent connect service but other metrics are working fine. I am using prometheus exporter java agent to expose metrics from Confluent connect as shown below. Confluent Connect Configuration (/usr/bin/connect-distributed) export KAFKA_OPTS='-javaagent:/opt/prometheus/jmx_prometheus_javaagent-0.12.0.jar=8093:/opt/prometheus/kafka-connect.yml' kafka-connect.yml - pattern: kafka.connect<type=connector-metrics, connector=(.+)><>([a-z-

Trying to configure prometheus with alert manager but getting error with rules file

て烟熏妆下的殇ゞ 提交于 2021-01-29 04:52:50
问题 In my prometheus.yml,the rules file is called rules.yml and it has this --- groups: - name: example rules: - alert: ServiceDown expr: up == 0 for: 2m labels: severity: critical annotations: summary: cannot connect to {{ $labels.job }} when i run sudo ./promtool check config rules.yml i get the error Checking rules.yml FAILED: parsing YAML file rules.yml: yaml: unmarshal errors: line 2: field groups not found in type config.plain I am not sure what is wrong as i am following this https:/

Prometheus API returning HTML instead of JSON

﹥>﹥吖頭↗ 提交于 2021-01-28 22:14:49
问题 Configured prometheus with kubernates and trying to execute queries using API's. Followed document to configure and execute the API https://github.com/prometheus/prometheus/blob/master/docs/querying/api.md Executing below curl command for output: curl -k -X GET "https://127.0.0.1/api/v1/query?query=kubelet_volume_stats_available_bytes" But getting output in HTML instead of JSON. Is any additional configuration needed to be done to get output in json format for prometheus? 回答1: Per the

When to use sum_over_time vs increase Promql Grafana

白昼怎懂夜的黑 提交于 2021-01-28 20:27:43
问题 I am a little unclear on when to exactly use increase and when to use sum_over_time in order to calculate a periodic collection of data in Grafana. I want to calculate the total percentage of availability of my system. Thanks. 回答1: The "increase" function calculates how much a counter increased in the specified interval. The "sum_over_time" function calculates the sum of all values in the specified interval. Suppose you have the following data series in the specified interval: 5, 5, 5, 5, 6,

Django: module not found while running in Docker container

一世执手 提交于 2021-01-28 20:01:05
问题 I'm running a Django project in a Docker container, and I want to add a module (specifically, django-prometheus) I ran: pip install django-prometheus and docker run -p 9090:9090 prom/prometheus successfully, and I made the necessary alterations to my settings.py and urls.py files, as specified in the README I then rebuilt the project and restarted it, but it is giving me the error ModuleNotFoundError: No module named 'django_prometheus' (full error report: Traceback (most recent call last):

Prometheus pod unable to call apiserver endpoints

╄→尐↘猪︶ㄣ 提交于 2021-01-28 19:41:09
问题 I am trying to set up monitoring stack (prometheus + alertmanager + node_exporter etc) via helm install stable/prometheus onto a raspberry pi k8s cluster (1 master + 3 worker nodes) which i set up. Managed to get all the required pods running. pi-monitoring-prometheus-alertmanager-767cd8bc65-89hxt 2/2 Running 0 131m 10.17.2.56 kube2 <none> <none> pi-monitoring-prometheus-node-exporter-h86gt 1/1 Running 0 131m 192.168.1.212 kube2 <none> <none> pi-monitoring-prometheus-node-exporter-kg957 1/1