Using Pyspark, how can I select/keep all columns of a DataFrame which contain a non-null value; or equivalently remove all columns which contain no data.
Just picking up pieces from the answers above, wrote my own solution for my use case.
What I essentially was trying to do is remove all columns from my pyspark dataframe which had 100% null values.
# identify and remove all columns having 100% null values
df_summary_count = your_df.summary("count")
null_cols = [c for c in df_summary_count .columns if df_summary_count.select(c).first()[c] == '0']
filtered_df = df_summary_count .drop(*null_cols)