How to add column to exploded struct in Spark?

前端 未结 1 1032
面向向阳花
面向向阳花 2021-01-23 18:57

Say I have the following data:

{\"id\":1, \"payload\":[{\"foo\":1, \"lol\":2},{\"foo\":2, \"lol\":2}]}

I would like to explode the payload and

相关标签:
1条回答
  • 2021-01-23 19:46
    df = df.withColumn('data', f.struct(
        df['data']['foo'].alias('foo'),
       (df['data']['foo'] * 2).alias('bar')
    ))
    

    This will result in:

    root
     |-- id: long (nullable = true)
     |-- data: struct (nullable = false)
     |    |-- col1: long (nullable = true)
     |    |-- bar: long (nullable = true)
    

    UPDATE:

    def func(x):
        tmp = x.asDict()
        tmp['foo'] = tmp.get('foo', 0) * 100
        res = zip(*tmp.items())
        return Row(*res[0])(*res[1])
    
    df = df.withColumn('data', f.UserDefinedFunction(func, StructType(
        [StructField('foo', StringType()), StructField('lol', StringType())]))(df['data']))
    

    P.S.

    Spark almost do not support inplace opreation.

    So every time you want to do inplace, you need to do replace actually.

    0 讨论(0)
提交回复
热议问题