Apache Kafka - KafkaStream on topic/partition

前端 未结 3 1912
夕颜
夕颜 2021-02-06 04:23

I am writing Kafka Consumer for high volume high velocity distributed application. I have only one topic but rate incoming messages is very high. Having multiple partition that

3条回答
  •  悲&欢浪女
    2021-02-06 05:25

    The recommended way to do this is to have a thread pool so Java can handle organisation for you and for each stream the createMessageStreamsByFilter method gives you consume it in a Runnable. For example:

    int NUMBER_OF_PARTITIONS = 6;
    Properties consumerConfig = new Properties();
    consumerConfig.put("zk.connect", "zookeeper.mydomain.com:2181" );
    consumerConfig.put("backoff.increment.ms", "100");
    consumerConfig.put("autooffset.reset", "largest");
    consumerConfig.put("groupid", "java-consumer-example");
    consumer = Consumer.createJavaConsumerConnector(new ConsumerConfig(consumerConfig));
    
    TopicFilter sourceTopicFilter = new Whitelist("mytopic|myothertopic");
    List> streams = consumer.createMessageStreamsByFilter(sourceTopicFilter, NUMBER_OF_PARTITIONS);
    
    ExecutorService executor = Executors.newFixedThreadPool(streams.size());
    for(final KafkaStream stream: streams){
        executor.submit(new Runnable() {
            public void run() {
                for (MessageAndMetadata msgAndMetadata: stream) {
                    ByteBuffer buffer = msgAndMetadata.message().payload();
                    byte [] bytes = new byte[buffer.remaining()];
                    buffer.get(bytes);
                    //Do something with the bytes you just got off Kafka.
                }
            }
        });
    }
    

    In this example I asked for 6 threads basically because I know that I have 3 partitions for each topic and I listed two topics in my whitelist. Once we have the handles of the incoming streams we can iterate over their content, which are MessageAndMetadata objects. Metadata is really just the topic name and offset. As you discovered you can do it in a single thread if you ask for 1 stream instead of, in my example 6, but if you require parallel processing the nice way is to launch an executor with one thread for each returned stream.

提交回复
热议问题