Java, How to get number of messages in a topic in apache kafka

后端未结

关注

 17  1316

I am using apache kafka for messaging. I have implemented the producer and consumer in Java. How can we get the number of messages in a topic?

相关标签:

17条回答

佛祖请我去吃肉

2020-11-30 19:35

The only way that comes to mind for this from a consumer point of view is to actually consume the messages and count them then.

The Kafka broker exposes JMX counters for number of messages received since start-up but you cannot know how many of them have been purged already.

In most common scenarios, messages in Kafka is best seen as an infinite stream and getting a discrete value of how many that is currently being kept on disk is not relevant. Furthermore things get more complicated when dealing with a cluster of brokers which all have a subset of the messages in a topic.

0 讨论(0)
发布评论:

提交评论
- 加载中...
误落风尘

2020-11-30 19:36
Sometimes the interest is in knowing the number of messages in each partition, for example, when testing a custom partitioner.The ensuing steps have been tested to work with Kafka 0.10.2.1-2 from Confluent 3.2. Given a Kafka topic, kt and the following command-line:
```
$ kafka-run-class kafka.tools.GetOffsetShell \
  --broker-list host01:9092,host02:9092,host02:9092 --topic kt
```
That prints the sample output showing the count of messages in the three partitions:
```
kt:2:6138
kt:1:6123
kt:0:6137
```
The number of lines could be more or less depending on the number of partitions for the topic.
0 讨论(0)
发布评论:

提交评论
- 加载中...

暖寄归人

2020-11-30 19:37

Apache Kafka command to get un handled messages on all partitions of a topic:

kafka-run-class kafka.tools.ConsumerOffsetChecker 
    --topic test --zookeeper localhost:2181 
    --group test_group

Prints:

Group      Topic        Pid Offset          logSize         Lag             Owner
test_group test         0   11051           11053           2               none
test_group test         1   10810           10812           2               none
test_group test         2   11027           11028           1               none

Column 6 is the un-handled messages. Add them up like this:

kafka-run-class kafka.tools.ConsumerOffsetChecker 
    --topic test --zookeeper localhost:2181 
    --group test_group 2>/dev/null | awk 'NR>1 {sum += $6} 
    END {print sum}'

awk reads the rows, skips the header line and adds up the 6th column and at the end prints the sum.

Prints

0 讨论(0)

悲&欢浪女

2020-11-30 19:38
Use https://prestodb.io/docs/current/connector/kafka-tutorial.html

A super SQL engine, provided by Facebook, that connects on several data sources (Cassandra, Kafka, JMX, Redis ...).

PrestoDB is running as a server with optional workers (there is a standalone mode without extra workers), then you use a small executable JAR (called presto CLI) to make queries.

Once you have configured well the Presto server , you can use traditionnal SQL:
```
SELECT count(*) FROM TOPIC_NAME;
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
青春惊慌失措

2020-11-30 19:38
I had this same question and this is how I am doing it, from a KafkaConsumer, in Kotlin:
```
val messageCount = consumer.listTopics().entries.filter { it.key == topicName }
    .map {
        it.value.map { topicInfo -> TopicPartition(topicInfo.topic(), topicInfo.partition()) }
    }.map { consumer.endOffsets(it).values.sum() - consumer.beginningOffsets(it).values.sum()}
    .first()
```
Very rough code, as I just got this to work, but basically you want to subtract the topic's beginning offset from the ending offset and this will be the current message count for the topic.

You can't just rely on the end offset because of other configurations (cleanup policy, retention-ms, etc.) that may end up causing the deletion old messages from your topic. Offsets only "move" forward, so it is the beggining offset that will move forward closer to the end offset (or eventually to the same value, if the topic contains no message right now).

Basically the end offset represents the overall number of messages that went through that topic, and the difference between the two represent the number of messages that the topic contains right now.
0 讨论(0)
发布评论:

提交评论
- 加载中...
太阳男子

2020-11-30 19:38

The simplest way I've found is to use the Kafdrop REST API /topic/topicName and specify the key: "Accept" / value: "application/json" header in order to get back a JSON response.

This is documented here.

0 讨论(0)
发布评论:

提交评论
- 加载中...