问题
I've a 4 node kafka cluster in my production where we are using custom partitioner which does mod 64 of an id to determine the partition. since last week, there has been imbalanced kafka messages_in rate on 1 of our nodes as can been seen in the graph attached. The pink line shows the message in rate on kafka01 node and bluish yellow line shows the message in rate on all other 3 boxes . I'm using datadog for monitoring and using the metric kafka.messages_in.rate . Assuming that there has been no change in the id distribution , there should have been no change in distribution of message in rate . Steps I've taken to debug the issue are
- Cluster is balanced with 16 leaders on each of 4 nodes.
- ISR are also balanced throughout the 4 boxes with each one having 32 ISR [replication factor of 2]
- Network in and out on all 4 boxes are almost equal.
Requesting any help or areas/metrics one can look into to debug this anomaly.
For people who are searching about this in future https://mail-archives.apache.org/mod_mbox/kafka-users/201710.mbox/%3CCALaekbwkSKapqPwsyuAoHGiSnc1+3jF2wF+2FDZbAVx61E+c2w@mail.gmail.com%3E
回答1:
Few things to debug
- enable the broker logs to trace
- Compare the logs of one receiving more request and once receiving less request for some short duration which will have ample produce requests to analyze for comparison
- Search for ProducerRequest in the log , it will give you the insight if it partitioning is happening as expected and also give info about from which host it is receiving more requests.
来源:https://stackoverflow.com/questions/49607708/debugging-imbalanced-kafka-message-in-rate