How to use Spark SQL to parse the JSON array of objects

后端 未结 2 1171
慢半拍i
慢半拍i 2021-02-10 02:15

now has JSON data as follows

{\"Id\":11,\"data\":[{\"package\":\"com.browser1\",\"activetime\":60000},{\"package\":\"com.browser6\",\"activetime\":1205000},{\"pa         


        
相关标签:
2条回答
  • 2021-02-10 02:20

    From your given json data you can view the schema of your dataframe with printSchema and use it

    appActiveTime.printSchema()
    root
     |-- data: array (nullable = true)
     |    |-- element: struct (containsNull = true)
     |    |    |-- activetime: long (nullable = true)
     |    |    |-- package: string (nullable = true)
    

    Since you have array you need to explode the data and select the struct field as below

    import org.apache.spark.sql.functions._
    appActiveTime.withColumn("data", explode($"data"))
           .select("data.*")
           .show(false)
    

    Output:

    +----------+------------+
    |activetime|     package|
    +----------+------------+
    |     60000|com.browser1|
    |   1205000|com.browser6|
    |   1205000|com.browser7|
    |     60000|com.browser1|
    |   1205000|com.browser6|
    +----------+------------+
    

    Hope this helps!

    0 讨论(0)
  • 2021-02-10 02:23

    with @Shankar Koirala 's help , I learned how to use ' explode' to handle joson array.

      val df = sqlContext.sql("SELECT data FROM behavior")
    appActiveTime.select(explode(df("data"))).toDF("data")
      .select("data.package","data.activetime")
      .show(false)
    
    0 讨论(0)
提交回复
热议问题