Are there any alerting options for scenarios where a Kafka Connect Connector or a Connector task fails or experiences errors?
We have Kafka Connect running, it runs
Since this post was written/answered, Kafka Connect began providing its own official metrics. The Apache Kafka Connect provides metrics in legacy JMX format.
If you use the Confluent Kafka Connect Helm Charts (https://github.com/confluentinc/cp-helm-charts/tree/master/charts/cp-kafka-connect), they include a Prometheus metrics exporter.
I monitor and alert on cp_kafka_connect_connect_connector_metrics{status="running"}
from the Confluent Helm Chart Prometheus chart, but there are many variations to that.
Using the official Kafka Connect metrics is generally preferable for any automated monitoring + alerting setup. This option wasn't available back when this post was written + answered.
FYI, Kafka still doesn't expose lag metrics, so you still need third party options to monitor and alert on lag.