Specify multiple columns data type changes to different data types in pyspark

前端 未结 1 1874
被撕碎了的回忆
被撕碎了的回忆 2021-01-07 08:23

I have a DataFrame (df) which consists of more than 50 columns and different types of data types, such as

df3.printSchema()


     CtpJobId: st         


        
相关标签:
1条回答
  • 2021-01-07 09:11

    Instead of enumerating all of your values, you should use a loop:

    for c in timestamp_type:
        df3 = df3.withColumn(c, df[c].cast(TimestampType()))
    
    for c in integer_type:
        df3 = df3.withColumn(c, df[c].cast(IntegerType()))
    

    Or equivalently, you can use functools.reduce:

    from functools import reduce   # not needed in python 2
    df3 = reduce(
        lambda df, c: df.withColumn(c, df[c].cast(TimestampType())), 
        timestamp_type,
        df3
    )
    
    df3 = reduce(
        lambda df, c: df.withColumn(c, df[c].cast(IntegerType())),
        integer_type,
        df3
    )
    
    0 讨论(0)
提交回复
热议问题