I read Spark Structured Streaming doesn\'t support schema inference for reading Kafka messages as JSON. Is there a way to retrieve schema the same as Spark Streaming does:
It is possible to convert JSON to a DataFrame without having to manually type the schema, if that is what you meant to ask.
Recently I ran into a situation where I was receiving massively long nested JSON packets via Kafka, and manually typing the schema would have been both cumbersome and error-prone.
With a small sample of the data and some trickery you can provide the schema to Spark2+ as follows:
val jsonstr = """ copy paste a representative sample of data here"""
val jsondf = spark.read.json(Seq(jsonstr).toDS) //jsondf.schema has the nested json structure we need
val event = spark.readStream.format..option...load() //configure your source
val eventWithSchema = event.select($"value" cast "string" as "json").select(from_json($"json", jsondf.schema) as "data").select("data.*")
Now you can do whatever you want with this val as you would with Direct Streaming. Create temp view, run SQL queries, whatever..