Split JSON string column to multiple columns

后端 未结 3 1302
一生所求
一生所求 2021-01-06 18:32

I\'m looking for a generic solution to extract all the json fields as columns from a JSON string column.

df =  spark.read.load(path)
df.show()
3条回答
  •  借酒劲吻你
    2021-01-06 19:09

    Assuming json_data is of type map (which you can always convert to map if it's not), you can use getItem:

    df = spark.createDataFrame([
        [1, {"name": "abc", "depts": ["dep01", "dep02"]}],
        [2, {"name": "xyz", "depts": ["dep03"], "sal": 100}]
    ],
        ['id', 'json_data']
    )
    
    df.select(
        df.id, 
        df.json_data.getItem('name').alias('name'), 
        df.json_data.getItem('depts').alias('depts'), 
        df.json_data.getItem('sal').alias('sal')
    ).show()
    
    +---+----+--------------+----+
    | id|name|         depts| sal|
    +---+----+--------------+----+
    |  1| abc|[dep01, dep02]|null|
    |  2| xyz|       [dep03]| 100|
    +---+----+--------------+----+
    

    A more dynamic way to extract columns:

    cols = ['name', 'depts', 'sal']
    df.select(df.id, *(df.json_data.getItem(col).alias(col) for col in cols)).show()
    

提交回复
热议问题