Split JSON string column to multiple columns

后端未结

关注

 3  1302

一生所求 2021-01-06 18:32

I\'m looking for a generic solution to extract all the json fields as columns from a JSON string column.

df =  spark.read.load(path)
df.show()

3条回答

借酒劲吻你 (楼主)

2021-01-06 19:09

Assuming json_data is of type map (which you can always convert to map if it's not), you can use getItem:

df = spark.createDataFrame([
    [1, {"name": "abc", "depts": ["dep01", "dep02"]}],
    [2, {"name": "xyz", "depts": ["dep03"], "sal": 100}]
],
    ['id', 'json_data']
)

df.select(
    df.id, 
    df.json_data.getItem('name').alias('name'), 
    df.json_data.getItem('depts').alias('depts'), 
    df.json_data.getItem('sal').alias('sal')
).show()

+---+----+--------------+----+
| id|name|         depts| sal|
+---+----+--------------+----+
|  1| abc|[dep01, dep02]|null|
|  2| xyz|       [dep03]| 100|
+---+----+--------------+----+

A more dynamic way to extract columns:

cols = ['name', 'depts', 'sal']
df.select(df.id, *(df.json_data.getItem(col).alias(col) for col in cols)).show()

0 讨论(0)

查看其它3个回答