问题
We need to monitor the size of a directory (for example the data directory of InfluxDB) to set up alerts in Grafana. As mentioned here: How to configure telegraf to send a folder-size to influxDB , there is no built-in plugin for this.
We don't mind using the inputs.exec
section of Telegraf. The directories are not huge (low filecount + dircount), so deep scanning (like the use of du
) is fine by us.
One of the directories we need to monitor is /var/lib/influxdb/data
.
What would be a simple script to execute, and what are the caveats?
回答1:
You could create a simple bash script metrics-exec_du.sh
with the following content (chmod 755):
#!/usr/bin/env bash
du -bs "${1}" | awk '{print "[ { \"bytes\": "$1", \"dudir\": \""$2"\" } ]";}'
And activate it by putting the following in the Telegraf config file:
[[inputs.exec]]
commands = [ "YOUR_PATH/metrics-exec_du.sh /var/lib/influxdb/data" ]
timeout = "5s"
name_override = "du"
name_suffix = ""
data_format = "json"
tag_keys = [ "dudir" ]
Caveats:
- The
du
command can stress your server, so use with care - The user
telegraf
must be able to scan the dirs. There are several options, but since InfluxDB's directory mask is a bit unspecified (see: https://github.com/influxdata/influxdb/issues/5171#issuecomment-306419800), we applied a rather crude workaround (examples are forUbuntu 16.04.2 LTS
):- Add the
influxdb
group to the usertelegraf
:sudo usermod --groups influxdb --append telegraf
- Put the following in the crontab, run for example each 10 minutes:
10 * * * * chmod -R g+rX /var/lib/influxdb/data > /var/log/influxdb/chmodfix.log 2>&1
- Add the
Result, configured in Grafana (data source: InfluxDB):
Cheers, TW
回答2:
If you need to monitor multiple directories I updated the answer by Tw Bert and extended it to allow you to pass them all on one command line. This saves you having to add multiple [[input.exec]]
entries into your telegraf.conf file.
Create the file /etc/telegraf/scripts/disk-usage.sh
containing:
#!/bin/bash
echo "["
du -ks "$@" | awk '{if (NR!=1) {printf ",\n"};printf " { \"directory_size_kilobytes\": "$1", \"path\": \""$2"\" }";}'
echo
echo "]"
I want to monitor two directories: /mnt/user/appdata/influxdb
and /mnt/user/appdata/grafana
. I can do something like this:
# Get disk usage for multiple directories
[[inputs.exec]]
commands = [ "/etc/telegraf/scripts/disk-usage.sh /mnt/user/appdata/influxdb /mnt/user/appdata/grafana" ]
timeout = "5s"
name_override = "du"
name_suffix = ""
data_format = "json"
tag_keys = [ "path" ]
Once you've updated your config, you can test this with:
telegraf --debug --config /etc/telegraf/telegraf.conf --input-filter exec --test
Which should show you what Telegraf will push to influx:
bash-4.3# telegraf --debug --config /etc/telegraf/telegraf.conf --input-filter exec --test
> du,host=SomeHost,path=/mnt/user/appdata/influxdb directory_size_kilobytes=80928 1536297559000000000
> du,host=SomeHost,path=/mnt/user/appdata/grafana directory_size_kilobytes=596 1536297559000000000
回答3:
The solutions already provided look good to me and highlighting the caveats such a read permission is great. An alternative worth mentioning is Using Telegraf to collect the data as proposed in monitor diskspace on influxdb with telegraf.
[[outputs.influxdb]]
urls = ["udp://your_host:8089"]
database = "telegraf_metrics"
## Retention policy to write to. Empty string writes to the default rp.
retention_policy = ""
## Write consistency (clusters only), can be: "any", "one", "quorum", "all"
write_consistency = "any"
## Write timeout (for the InfluxDB client), formatted as a string.
## If not provided, will default to 5s. 0s means no timeout (not recommended).
timeout = "5s"
# Read metrics about disk usage by mount point
[[inputs.disk]]
## By default, telegraf gather stats for all mountpoints.
## Setting mountpoints will restrict the stats to the specified mountpoints.
# mount_points = ["/"]
## Ignore some mountpoints by filesystem type. For example (dev)tmpfs (usually
## present on /run, /var/run, /dev/shm or /dev).
ignore_fs = ["tmpfs", "devtmpfs"]
Note: the timeout should be considered carefully. Maybe hourly readings would be sufficient to avoid exhaustion by logging.
来源:https://stackoverflow.com/questions/44386205/how-to-monitor-the-size-of-a-directory-via-telegraf