We are running a 3 broker Kafka 0.10.0.1 cluster. We have a java app which spawns many consumer threads consuming from different topics. For every topic we have specified differ
Check the __consumer_offsets
partitions size on disk. We faced similar issue that was due to compaction errors. This leads to very long rebalances.
See https://issues.apache.org/jira/browse/KAFKA-5413 for more details (solved since kafka 0.10.2.2 / 0.11)
Another option is that that your broker configuration is incorrect, and compaction is turned off, and log.cleaner.enable
if false. __consumer_offsets
is a compacted topic, so if log.cleaner is disabled, it will not be compacted and lead to the same symptom.