How to decode/deserialize Avro with Python from Kafka

后端 未结 2 460
情话喂你
情话喂你 2020-12-11 07:40

I am receiving from a remote server Kafka Avro messages in Python (using the consumer of Confluent Kafka Python library), that represent clickstream data with json dictionar

相关标签:
2条回答
  • 2020-12-11 08:22

    I thought Avro library was just to read Avro files, but it actually solved the problem of decoding Kafka messages, as follow: I first import the libraries and give the schema file as a parameter and then create a function to decode the message into a dictionary, that I can use in the consumer loop.

    import io
    
    from confluent_kafka import Consumer, KafkaError
    from avro.io import DatumReader, BinaryDecoder
    import avro.schema
    
    schema = avro.schema.Parse(open("data_sources/EventRecord.avsc").read())
    reader = DatumReader(schema)
    
    def decode(msg_value):
        message_bytes = io.BytesIO(msg_value)
        decoder = BinaryDecoder(message_bytes)
        event_dict = reader.read(decoder)
        return event_dict
    
    c = Consumer()
    c.subscribe(topic)
    running = True
    while running:
        msg = c.poll()
        if not msg.error():
            msg_value = msg.value()
            event_dict = decode(msg_value)
            print(event_dict)
        elif msg.error().code() != KafkaError._PARTITION_EOF:
            print(msg.error())
            running = False
    
    0 讨论(0)
  • 2020-12-11 08:25

    If you use Confluent Schema Registry and want to deserialize avro messages, just add message_bytes.seek(5) to the decode function, since Confluent adds 5 extra bytes before the typical avro-formatted data.

    def decode(msg_value):
        message_bytes = io.BytesIO(msg_value)
        message_bytes.seek(5)
        decoder = BinaryDecoder(message_bytes)
        event_dict = reader.read(decoder)
        return event_dict
    
    0 讨论(0)
提交回复
热议问题