Drop if all entries in a spark dataframe's specific column is null

后端未结

关注

 8  1172

轮回少年 2021-01-13 19:11

Using Pyspark, how can I select/keep all columns of a DataFrame which contain a non-null value; or equivalently remove all columns which contain no data.

8条回答

逝去的感伤 (楼主)

2021-01-13 19:32
Just picking up pieces from the answers above, wrote my own solution for my use case.

What I essentially was trying to do is remove all columns from my pyspark dataframe which had 100% null values.
```
# identify and remove all columns having 100% null values
df_summary_count = your_df.summary("count")
null_cols = [c for c in df_summary_count .columns if df_summary_count.select(c).first()[c] == '0']
filtered_df = df_summary_count .drop(*null_cols)
```
0 讨论(0)

查看其它8个回答
发布评论:

提交评论
- 加载中...