I know that count called on an RDD or a DataFrame is an action. But while fiddling with the spark shell, I observed the following
scala> val empDF = Seq((1,\"
Case 1:
You use rdd.count()
to count the number of rows. Since it initiates the DAG execution and returns the data to the driver, its an action for RDD.
for ex: rdd.count // it returns a Long value
Case 2:
If you call count on Dataframe, it initiates the DAG execution and returns the data to the driver, its an action for Dataframe.
for ex: df.count // it returns a Long value
Case 3:
In your case you are calling groupBy
on dataframe
which returns RelationalGroupedDataset
object, and you are calling count
on grouped Dataset which returns a Dataframe
, so its a transformation since it doesn't gets the data to the driver and initiates the DAG execution.
for ex:
df.groupBy("department") // returns RelationalGroupedDataset
.count // returns a Dataframe so a transformation
.count // returns a Long value since called on DF so an action