How to deserialize Avro messages from Kafka in Flink (Scala)?

强颜欢笑 提交于 2019-12-23 04:54:28

问题


I'm reading messages from Kafka into Flink Shell (Scala), as follows :

scala> val stream = senv.addSource(new FlinkKafkaConsumer011[String]("topic", new SimpleStringSchema(), properties)).print()
warning: there was one deprecation warning; re-run with -deprecation for details
stream: org.apache.flink.streaming.api.datastream.DataStreamSink[String] = org.apache.flink.streaming.api.datastream.DataStreamSink@71de1091

Here, I'm using the SimpleStringSchema() as the deserializer, but actually the messages have another Avro schema (say msg.avsc). How do I create a deserializer based on this different Avro schema (msg.avsc), to deserialize the incoming Kafka messages?

I haven't been able to find any code examples or tutorials for doing this in Scala, so any inputs would help. It seems that I may need to extend and implement

org.apache.flink.streaming.util.serialization.DeserializationSchema

for decoding the messages, but I don't know, how to do it. Any tutorials or instructions would be of great help. Since, I don't want to do any custom processing, but just parse the messages as per the Avro schema (msg.avsc), any quick methods of doing this would be very helpful.


回答1:


I found example for AvroDeserializationSchema class in java

https://github.com/okkam-it/flink-examples/blob/master/src/main/java/org/okkam/flink/avro/AvroDeserializationSchema.java

Code snippet:

If you want to deserialize into specific case class then use new FlinkKafkaConsumer011[case_class_name], new AvroDeserializationSchema[case_class_name](classOf[case_class_name]

val stream = env .addSource(new FlinkKafkaConsumer011[DeviceData]
 ("test", new AvroDeserializationSchema[case_class_name](classOf[case_class_name]), properties))

If you use Confluent's schema registry, then preferred solution would be to use the Avro serde provided by Confluent. We just call deserialize() and the resolution of the latest version of the Avro schema to use is done automatically behind the scene and no byte manipulation is required.

Something like below in scala.

import io.confluent.kafka.serializers.KafkaAvroDeserializer

...

val valueDeserializer = new KafkaAvroDeserializer()
valueDeserializer.configure(
  Map(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG -> schemaRegistryUrl).asJava, 
  false)

...

override def deserialize(messageKey: Array[Byte], message: Array[Byte], 
                       topic: String, partition: Int, offset: Long): KafkaKV = {

    val key = keyDeserializer.deserialize(topic, messageKey).asInstanceOf[GenericRecord]
    val value = valueDeserializer.deserialize(topic, message).asInstanceOf[GenericRecord]

    KafkaKV(key, value)
    }

...

detailed explanation here: http://svend.kelesia.com/how-to-integrate-flink-with-confluents-schema-registry.html#how-to-integrate-flink-with-confluents-schema-registry

Hope it helps!



来源:https://stackoverflow.com/questions/55128833/how-to-deserialize-avro-messages-from-kafka-in-flink-scala

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!