Efficient column processing in PySpark

后端未结

关注

 3  1747

死守一世寂寞 2021-01-15 13:46

I have a dataframe with a very large number of columns (>30000).

I\'m filling it with 1 and 0 based on the first column like this:

3条回答

花落未央 (楼主)

2021-01-15 14:13

You might approach like this,

import pyspark.sql.functions as F

exprs = [F.when(F.array_contains(F.col('list_column'), column), 1).otherwise(0).alias(column)\
                  for column in list_column_names]

df = df.select(['list_column']+exprs)

0 讨论(0)

查看其它3个回答