So, what I\'m doing below is I drop a column A
from a DataFrame
because I want to apply a transformation (here I just json.loads
a JSON st
I do not think you need to drop the column and do the join. The following code should* be equivalent to what you posted:
cols = df_data.columns
df = df_data.rdd\
.map(
lambda row: tuple(
[row[c] if c != 'A' else (json.loads(row[c]) if row[c] is not None else None)
for c in cols]
)
)\
.toDF(cols)
*I haven't actually tested this code, but I think this should work.
But to answer your general question, you can transform a column in-place using withColumn()
.
df = df_data.withColumn("A", my_transformation_function("A").alias("A"))
Where my_transformation_function()
can be a udf
or a pyspark sql function
.