问题
I'm reading messages from Kafka into Flink Shell (Scala), as follows :
scala> val stream = senv.addSource(new FlinkKafkaConsumer011[String]("topic", new SimpleStringSchema(), properties)).print()
warning: there was one deprecation warning; re-run with -deprecation for details
stream: org.apache.flink.streaming.api.datastream.DataStreamSink[String] = org.apache.flink.streaming.api.datastream.DataStreamSink@71de1091
Here, I'm using the SimpleStringSchema() as the deserializer, but actually the messages have another Avro schema (say msg.avsc). How do I create a deserializer based on this different Avro schema (msg.avsc), to deserialize the incoming Kafka messages?
I haven't been able to find any code examples or tutorials for doing this in Scala, so any inputs would help. It seems that I may need to extend and implement
org.apache.flink.streaming.util.serialization.DeserializationSchema
for decoding the messages, but I don't know, how to do it. Any tutorials or instructions would be of great help. Since, I don't want to do any custom processing, but just parse the messages as per the Avro schema (msg.avsc), any quick methods of doing this would be very helpful.
回答1:
I found example for AvroDeserializationSchema class in java
https://github.com/okkam-it/flink-examples/blob/master/src/main/java/org/okkam/flink/avro/AvroDeserializationSchema.java
Code snippet:
If you want to deserialize into specific case class then use new FlinkKafkaConsumer011[case_class_name]
, new AvroDeserializationSchema[case_class_name](classOf[case_class_name]
val stream = env .addSource(new FlinkKafkaConsumer011[DeviceData]
("test", new AvroDeserializationSchema[case_class_name](classOf[case_class_name]), properties))
If you use Confluent's schema registry, then preferred solution would be to use the Avro serde provided by Confluent. We just call deserialize() and the resolution of the latest version of the Avro schema to use is done automatically behind the scene and no byte manipulation is required.
Something like below in scala.
import io.confluent.kafka.serializers.KafkaAvroDeserializer
...
val valueDeserializer = new KafkaAvroDeserializer()
valueDeserializer.configure(
Map(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG -> schemaRegistryUrl).asJava,
false)
...
override def deserialize(messageKey: Array[Byte], message: Array[Byte],
topic: String, partition: Int, offset: Long): KafkaKV = {
val key = keyDeserializer.deserialize(topic, messageKey).asInstanceOf[GenericRecord]
val value = valueDeserializer.deserialize(topic, message).asInstanceOf[GenericRecord]
KafkaKV(key, value)
}
...
detailed explanation here: http://svend.kelesia.com/how-to-integrate-flink-with-confluents-schema-registry.html#how-to-integrate-flink-with-confluents-schema-registry
Hope it helps!
来源:https://stackoverflow.com/questions/55128833/how-to-deserialize-avro-messages-from-kafka-in-flink-scala