SPARK: How to parse a Array of JSON object using Spark

后端 未结 2 2033
灰色年华
灰色年华 2021-01-14 12:29

I have a file with normal columns and a column that contains a Json string which is as below. Also picture attached. Each row actually belongs to a column named Demo(not Vis

2条回答
  •  一向
    一向 (楼主)
    2021-01-14 13:10

    If your column with JSON looks like this

        import spark.implicits._
    
        val inputDF = Seq(
          ("""[{"key":"device_kind","value":"desktop"},{"key":"country_code","value":"ID"},{"key":"device_platform","value":"windows"}]"""),
          ("""[{"key":"device_kind","value":"mobile"},{"key":"country_code","value":"BE"},{"key":"device_platform","value":"android"}]"""),
          ("""[{"key":"device_kind","value":"mobile"},{"key":"country_code","value":"QA"},{"key":"device_platform","value":"android"}]""")
        ).toDF("Demographics")
    
      inputDF.show(false)
    +-------------------------------------------------------------------------------------------------------------------------+
    |Demographics                                                                                                             |
    +-------------------------------------------------------------------------------------------------------------------------+
    |[{"key":"device_kind","value":"desktop"},{"key":"country_code","value":"ID"},{"key":"device_platform","value":"windows"}]|
    |[{"key":"device_kind","value":"mobile"},{"key":"country_code","value":"BE"},{"key":"device_platform","value":"android"}] |
    |[{"key":"device_kind","value":"mobile"},{"key":"country_code","value":"QA"},{"key":"device_platform","value":"android"}] |
    +-------------------------------------------------------------------------------------------------------------------------+
    

    you can try to parse the column in the following way:

      val parsedJson: DataFrame = inputDF.selectExpr("Demographics", "from_json(Demographics, 'array>') as parsed_json")
    
      val splitted = parsedJson.select(
        col("parsed_json").as("Demographics"),
        col("parsed_json").getItem(0).as("device_kind_json"),
        col("parsed_json").getItem(1).as("country_code_json"),
        col("parsed_json").getItem(2).as("device_platform_json")
      )
    
      val result = splitted.select(
        col("Demographics"),
        col("device_kind_json.value").as("device_kind"),
        col("country_code_json.value").as("country_code"),
        col("device_platform_json.value").as("device_platform")
      )
    
      result.show(false)
    

    You will get the output:

    +------------------------------------------------------------------------+-----------+------------+---------------+
    |Demographics                                                            |device_kind|country_code|device_platform|
    +------------------------------------------------------------------------+-----------+------------+---------------+
    |[[device_kind, desktop], [country_code, ID], [device_platform, windows]]|desktop    |ID          |windows        |
    |[[device_kind, mobile], [country_code, BE], [device_platform, android]] |mobile     |BE          |android        |
    |[[device_kind, mobile], [country_code, QA], [device_platform, android]] |mobile     |QA          |android        |
    +------------------------------------------------------------------------+-----------+------------+---------------+
    

提交回复
热议问题