This should help to get distinct values of a column:
df.select('column1').distinct().collect()
Note that .collect() doesn't have any built-in limit on how many values can return so this might be slow -- use .show() instead or add .limit(20) before .collect() to manage this.