Using Pyspark, how can I select/keep all columns of a DataFrame which contain a non-null value; or equivalently remove all columns which contain no data.
for me it worked in a bit different way than @Suresh answer:
nonNull_cols = [c for c in original_df.columns if original_df.filter(func.col(c).isNotNull()).count() > 0] new_df = original_df.select(*nonNull_cols)