Kafka Consumer: How to start consuming from the last message in Python

后端 未结 5 1256
闹比i
闹比i 2020-12-28 22:08

I am using Kafka 0.8.1 and Kafka python-0.9.0. In my setup, I have 2 kafka brokers setup. When I run my kafka consumer, I can see it retrieving messages from the queue and k

相关标签:
5条回答
  • 2020-12-28 22:51

    First of all, you need to set a group_id, recording the offset so that it will resume consuming the message from this group_id.

    If you have already consumed all of the existing messanges in the group, then you want to re-consume the messanges once more. you can use seek to achieve this.

    Here is an example:

    def test_consume_from_offset(offset):
        topic = 'test'
        consumer = KafkaConsumer(bootstrap_servers=broker_list, group_id='test')
        tp = TopicPartition(topic=topic, partition=0)
        consumer.assign([tp])
        consumer.seek(tp, offset)   # you can set the offset you want to resume from.
        for msg in consumer:
            # the msg begins with the offset you set
            print(msg)
    
    test_consume_from_offset(10)
    
    0 讨论(0)
  • 2020-12-28 22:57

    You just need to make sure that you Kafka Consumer starts reading from the latest offset (auto.offset.reset="latest"). Also make sure that you define a Consumer Group so that the offsets can be committed and when the consumer goes down can pick its last committed position.


    Using confluent-kafka-python

    from confluent_kafka import Consumer
    
    
    c = Consumer({
        'bootstrap.servers': 'localhost:9092',
        'group.id': 'mygroup',
        'auto.offset.reset': 'latest'
    })
    
    c.subscribe(['my_topic'])
    

    Using kafka-python

    from kafka import KafkaConsumer
    
    
    consumer = KafkaConsumer(
        'my_topic', 
        bootstrap_servers=['localhost:9092'],
        auto_offset_reset='latest', 
        enable_auto_commit=True,
        group_id='mygroup'
    )
    
    0 讨论(0)
  • 2020-12-28 23:09

    Kafka consumer is able to store offsets in Zookeeper. In Java API we have two options - high-level consumer, that manages state for us and starts consuming where it left after restart, and stateless low-level consumer without this superpower.

    From what I understand in Python's consumer code (https://github.com/mumrah/kafka-python/blob/master/kafka/consumer.py), both SimpleConsumer and MultiProcessConsumer are stateful and keep track of current offsets in Zookeeper, so it is strange that you have this reconsuming problem.

    Make sure you have the same consumer group ids across restarts (may be you set it random?) and check the following options:

    auto_commit: default True. Whether or not to auto commit the offsets
    auto_commit_every_n: default 100. How many messages to consume
                         before a commit
    auto_commit_every_t: default 5000. How much time (in milliseconds) to
                         wait before commit
    

    May be you consume < 100 messages or < 5000 ms?

    0 讨论(0)
  • 2020-12-28 23:13

    Take care with the kafka-python library. It has a few minor issues.

    If speed is not really a problem for your consumer you can set the auto-commit in every message. It should works.

    SimpleConsumer provides a seek method (https://github.com/mumrah/kafka-python/blob/master/kafka/consumer/simple.py#L174-L185) that allows you to start consuming messages in whatever point you want.

    The most usual calls are:

    • consumer.seek(0, 0) to start reading from the beginning of the queue.
    • consumer.seek(0, 1) to start reading from current offset.
    • consumer.seek(0, 2) to skip all the pending messages and start reading only new messages.

    The first argument is an offset to those positions. In that way, if you call consumer.seek(5, 0) you will skip the first 5 messages from the queue.

    Also, don't forget, the offset is stored for consumer groups. Be sure you are using the same one all the time.

    0 讨论(0)
  • 2020-12-28 23:14

    kafka-python stores offsets with the kafka server, not on a separate zookeeper connection. Unfortunately, the kafka server apis to support commit/fetching offsets were not fully functional until apache kafka 0.8.1.1. If you upgrade your kafka server, your setup should work. I'd also suggest upgrading kafka-python to 0.9.4.

    [kafka-python maintainer]

    0 讨论(0)
提交回复
热议问题