Calling collect() on an RDD will return the entire dataset to the driver which can cause out of memory and we should avoid that.
collect()
Will collect()
Select is a transformation, not an action, so it is lazily evaluated (won't actually do the calculations just map the operations). Collect is an action.
Select
Collect
Try:
df.limit(20).collect()