Spark Streamming : Reading data from kafka that has multiple schema

五迷三道 提交于 2020-01-01 06:32:33

问题


I am struggling with the implementation in spark streaming.

The messages from the kafka looks like this but with with more fields

{"event":"sensordata", "source":"sensors", "payload": {"actual data as a json}}
{"event":"databasedata", "mysql":"sensors", "payload": {"actual data as a json}}
{"event":"eventApi", "source":"event1", "payload": {"actual data as a json}}
{"event":"eventapi", "source":"event2", "payload": {"actual data as a json}}

I am trying to read the messages from a Kafka topic (which has multiple schemas). I need to read each message and look for an event and source field and decide where to store as a Dataset. The actual data is in the field payload as a JSON which is only a single record.

Can someone help me to implement this or any other alternatives?

Is it a good way to send the messages with multiple schemas in the same topic and consume it?

Thanks in advance,


回答1:


You can create a Dataframe from the incoming JSON object.

Create Seq[Sring] of JSON object.

Use val df=spark.read.json[Seq[String]].

Perform the operations on the dataframe df of your choice.




回答2:


Converting JsonString to JavaBean if you only care about some columns



来源:https://stackoverflow.com/questions/46904339/spark-streamming-reading-data-from-kafka-that-has-multiple-schema

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!