Spark Read Json: how to read field that alternates between integer and struct

后端 未结 1 1226
太阳男子
太阳男子 2021-01-24 00:36

Trying to read multiple json files into a dataframe, both files have a \"Value\" node but the type of this node alternates between integer and struct:

File 1:

         


        
相关标签:
1条回答
  • 2021-01-24 01:23

    Try if this helps-

    Load the test data

        /**
          * test/File1.json
          * -----
          * {
          * "Value": 123
          * }
          */
        /**
          * test/File2.json
          * ---------
          * {
          * "Value": {
          * "Value": "On",
          * "ValueType": "State",
          * "IsSystemValue": true
          * }
          * }
          */
        val path = getClass.getResource("/test" ).getPath
        val df = spark.read
          .option("multiLine", true)
          .json(path)
    
        df.show(false)
        df.printSchema()
    
        /**
          * +-------------------------------------------------------+
          * |Value                                                  |
          * +-------------------------------------------------------+
          * |{"Value":"On","ValueType":"State","IsSystemValue":true}|
          * |123                                                    |
          * +-------------------------------------------------------+
          *
          * root
          * |-- Value: string (nullable = true)
          */
    

    Transform string json

        df.withColumn("File", substring_index(input_file_name(),"/", -1))
          .withColumn("ValueType", get_json_object(col("Value"), "$.ValueType"))
          .withColumn("IsSystemValue", get_json_object(col("Value"), "$.IsSystemValue"))
          .withColumn("Value", coalesce(get_json_object(col("Value"), "$.Value"), col("Value")))
          .show(false)
    
        /**
          * +-----+----------+---------+-------------+
          * |Value|File      |ValueType|IsSystemValue|
          * +-----+----------+---------+-------------+
          * |On   |File2.json|State    |true         |
          * |123  |File1.json|null     |null         |
          * +-----+----------+---------+-------------+
          */
    
    0 讨论(0)
提交回复
热议问题