Spark structured streaming kafka convert JSON without schema (infer schema)

前端 未结 5 2078
时光说笑
时光说笑 2020-12-08 17:18

I read Spark Structured Streaming doesn\'t support schema inference for reading Kafka messages as JSON. Is there a way to retrieve schema the same as Spark Streaming does:

5条回答
  •  醉梦人生
    2020-12-08 17:48

    It is possible to convert JSON to a DataFrame without having to manually type the schema, if that is what you meant to ask.

    Recently I ran into a situation where I was receiving massively long nested JSON packets via Kafka, and manually typing the schema would have been both cumbersome and error-prone.

    With a small sample of the data and some trickery you can provide the schema to Spark2+ as follows:

    val jsonstr = """ copy paste a representative sample of data here"""
    val jsondf = spark.read.json(Seq(jsonstr).toDS) //jsondf.schema has the nested json structure we need
    
    val event = spark.readStream.format..option...load() //configure your source
    
    val eventWithSchema = event.select($"value" cast "string" as "json").select(from_json($"json", jsondf.schema) as "data").select("data.*")
    

    Now you can do whatever you want with this val as you would with Direct Streaming. Create temp view, run SQL queries, whatever..

提交回复
热议问题