Prometheus is built around returning a time series representation of metrics. In many cases, however, I only care about what the state of a metric is right now<
Given this:
namespace_metricname_count_sum{id="1",status="to-do"}
namespace_metricname_count_sum{id="1",status="in-progress"}
If you want to get the most recent value you need to use the value that has in common in this case is id=~".*"
by grouping the logs you will be able to get the last value in a time range
count ( max_over_time ( namespace_metricname_count_sum{id=~".*"}[12h])) by (status)
All you need is my_metric
, which will by default return the most recent value no more than 5 minutes old.
To get the most recent value of my_metric
older than 5m without resorting to hacky PromQL queries, you can modify the query.lookback-delta
Prometheus option which is where this default 5m
value is defined.
For example, specifying --query.lookback-delta=1d
in your Prometheus launch options and restarting the service will cause the PromQL query my_metric
to return the most recent value of my_metric
looking back 24 hours.
Metrics outside this "look-back time window" are called stale.
I had a similar issue with metrics I was getting from AWS via prom/cloudwatch-exporter. It seems AWS takes awhile to converge its CloudWatch metrics. It used to be about 10 minutes but now it's more like 13 minutes. We've been missing issues like disk space low because these metrics utterly fail to make it to prometheus, therefore our alerts were useless.
I found "offset" useful here, where I wanted the last metric but it was outside of the 5m cutoff. So by specifying an offset, I can still pick up a value instead of nothing. Example:
aws_ec2_cpuutilization_average offset 15m