Transforming a column and update the DataFrame

前端未结

关注

 2  588

So, what I\'m doing below is I drop a column A from a DataFrame because I want to apply a transformation (here I just json.loads a JSON st

相关标签:

2条回答

陌清茗

2021-01-25 11:36
I do not think you need to drop the column and do the join. The following code should^* be equivalent to what you posted:
```
cols = df_data.columns
df = df_data.rdd\
    .map(
        lambda row: tuple(
            [row[c] if c != 'A' else (json.loads(row[c]) if row[c] is not None else None) 
             for c in cols]
        )
    )\
    .toDF(cols)
```
^*I haven't actually tested this code, but I think this should work.

But to answer your general question, you can transform a column in-place using withColumn().
```
df = df_data.withColumn("A", my_transformation_function("A").alias("A"))
```
Where my_transformation_function() can be a udf or a pyspark sql function.
0 讨论(0)
发布评论:

提交评论
- 加载中...

别跟我提以往

2021-01-25 11:55

From what i could understand, is it something like this you are trying to achieve?

import pyspark.sql.functions as F
import json

json_convert = F.udf(lambda x: json.loads(x) if x is not None else None)

cols = df_data.columns
df = df_data.select([json_convert(F.col('A')).alias('A')] + \
                    [col for col in cols if col != 'A'])

0 讨论(0)