I am trying some code in Spark (pyspark) for an assignment. First time I use this environment, so for sure I miss something…
I have a simple dataset called c_views.
Other simple ways to achieve the result?
from operator import add
c_views.reduceByKey(add)
or if you prefer lambda expressions:
c_views.reduceByKey(lambda x, y: x + y)
I do not understand what exactly I have to code in the function
It has to be a function which takes two values of the same types as the values in your RDD and returns a value of the same type as inputs. It also has to be associative which means that the final result cannot depend how do you arrange parentheses.