Drop if all entries in a spark dataframe's specific column is null

后端 未结 8 1172
轮回少年
轮回少年 2021-01-13 19:11

Using Pyspark, how can I select/keep all columns of a DataFrame which contain a non-null value; or equivalently remove all columns which contain no data.

8条回答
  •  逝去的感伤
    2021-01-13 19:32

    Just picking up pieces from the answers above, wrote my own solution for my use case.

    What I essentially was trying to do is remove all columns from my pyspark dataframe which had 100% null values.

    # identify and remove all columns having 100% null values
    df_summary_count = your_df.summary("count")
    null_cols = [c for c in df_summary_count .columns if df_summary_count.select(c).first()[c] == '0']
    filtered_df = df_summary_count .drop(*null_cols)
    

提交回复
热议问题