问题
I am struggling with the implementation in spark streaming.
The messages from the kafka looks like this but with with more fields
{"event":"sensordata", "source":"sensors", "payload": {"actual data as a json}}
{"event":"databasedata", "mysql":"sensors", "payload": {"actual data as a json}}
{"event":"eventApi", "source":"event1", "payload": {"actual data as a json}}
{"event":"eventapi", "source":"event2", "payload": {"actual data as a json}}
I am trying to read the messages from a Kafka topic (which has multiple schemas). I need to read each message and look for an event and source field and decide where to store as a Dataset. The actual data is in the field payload as a JSON which is only a single record.
Can someone help me to implement this or any other alternatives?
Is it a good way to send the messages with multiple schemas in the same topic and consume it?
Thanks in advance,
回答1:
You can create a Dataframe
from the incoming JSON object.
Create Seq[Sring]
of JSON object.
Use val df=spark.read.json[Seq[String]]
.
Perform the operations on the dataframe df
of your choice.
回答2:
Converting JsonString to JavaBean
if you only care about some columns
来源:https://stackoverflow.com/questions/46904339/spark-streamming-reading-data-from-kafka-that-has-multiple-schema