Spark dataframe: collect () vs select ()

前端 未结 6 453
情话喂你
情话喂你 2020-12-13 06:33

Calling collect() on an RDD will return the entire dataset to the driver which can cause out of memory and we should avoid that.

Will collect()

6条回答
  •  有刺的猬
    2020-12-13 07:04

    Select is a transformation, not an action, so it is lazily evaluated (won't actually do the calculations just map the operations). Collect is an action.

    Try:

    df.limit(20).collect()

提交回复
热议问题