Efficient column processing in PySpark

后端 未结 3 1747
死守一世寂寞
死守一世寂寞 2021-01-15 13:46

I have a dataframe with a very large number of columns (>30000).

I\'m filling it with 1 and 0 based on the first column like this:

3条回答
  •  花落未央
    2021-01-15 14:13

    You might approach like this,

    import pyspark.sql.functions as F
    
    exprs = [F.when(F.array_contains(F.col('list_column'), column), 1).otherwise(0).alias(column)\
                      for column in list_column_names]
    
    df = df.select(['list_column']+exprs)
    

提交回复
热议问题