Spark: Is “count” on Grouped Data a Transformation or an Action?

前端 未结 3 813
你的背包
你的背包 2021-02-13 20:06

I know that count called on an RDD or a DataFrame is an action. But while fiddling with the spark shell, I observed the following

scala> val empDF = Seq((1,\"         


        
3条回答
  •  一向
    一向 (楼主)
    2021-02-13 20:44

    As you've already figure out - if method returns a distributed object (Dataset or RDD) it can be qualified as a transformations.

    However these distinctions are much better suited for RDDs than Datasets. The latter ones features an optimizer, including recently added cost based optimizer, and might be much less lazy the old API, blurring differences between transformation and action in some case.

    Here however it is safe to say count is a transformation.

提交回复
热议问题