I have a DataFrame (df
) which consists of more than 50 columns and different types of data types, such as
df3.printSchema()
CtpJobId: st
Instead of enumerating all of your values, you should use a loop:
for c in timestamp_type:
df3 = df3.withColumn(c, df[c].cast(TimestampType()))
for c in integer_type:
df3 = df3.withColumn(c, df[c].cast(IntegerType()))
Or equivalently, you can use functools.reduce
:
from functools import reduce # not needed in python 2
df3 = reduce(
lambda df, c: df.withColumn(c, df[c].cast(TimestampType())),
timestamp_type,
df3
)
df3 = reduce(
lambda df, c: df.withColumn(c, df[c].cast(IntegerType())),
integer_type,
df3
)