Say I have the following data:
{\"id\":1, \"payload\":[{\"foo\":1, \"lol\":2},{\"foo\":2, \"lol\":2}]}
I would like to explode the payload and
df = df.withColumn('data', f.struct(
df['data']['foo'].alias('foo'),
(df['data']['foo'] * 2).alias('bar')
))
This will result in:
root
|-- id: long (nullable = true)
|-- data: struct (nullable = false)
| |-- col1: long (nullable = true)
| |-- bar: long (nullable = true)
UPDATE:
def func(x):
tmp = x.asDict()
tmp['foo'] = tmp.get('foo', 0) * 100
res = zip(*tmp.items())
return Row(*res[0])(*res[1])
df = df.withColumn('data', f.UserDefinedFunction(func, StructType(
[StructField('foo', StringType()), StructField('lol', StringType())]))(df['data']))
P.S.
Spark almost do not support inplace opreation.
So every time you want to do inplace, you need to do replace actually.