How to add column to exploded struct in Spark?

前端未结

关注

 1  1031

面向向阳花 2021-01-23 18:57

Say I have the following data:

{\"id\":1, \"payload\":[{\"foo\":1, \"lol\":2},{\"foo\":2, \"lol\":2}]}

I would like to explode the payload and

1条回答

孤城傲影 (楼主)

2021-01-23 19:46

df = df.withColumn('data', f.struct(
    df['data']['foo'].alias('foo'),
   (df['data']['foo'] * 2).alias('bar')
))

This will result in:

root
 |-- id: long (nullable = true)
 |-- data: struct (nullable = false)
 |    |-- col1: long (nullable = true)
 |    |-- bar: long (nullable = true)

UPDATE:

def func(x):
    tmp = x.asDict()
    tmp['foo'] = tmp.get('foo', 0) * 100
    res = zip(*tmp.items())
    return Row(*res[0])(*res[1])

df = df.withColumn('data', f.UserDefinedFunction(func, StructType(
    [StructField('foo', StringType()), StructField('lol', StringType())]))(df['data']))

P.S.

Spark almost do not support inplace opreation.

So every time you want to do inplace, you need to do replace actually.

0 讨论(0)