show distinct column values in pyspark dataframe: python

后端 未结 9 715
忘了有多久
忘了有多久 2020-12-23 10:55

Please suggest pyspark dataframe alternative for Pandas df[\'col\'].unique().

I want to list out all the unique values in a pyspark dataframe column.

9条回答
  •  囚心锁ツ
    2020-12-23 11:38

    This should help to get distinct values of a column:

    df.select('column1').distinct().collect()
    

    Note that .collect() doesn't have any built-in limit on how many values can return so this might be slow -- use .show() instead or add .limit(20) before .collect() to manage this.

提交回复
热议问题