pyspark and reduceByKey: how to make a simple sum

后端 未结 1 1537
心在旅途
心在旅途 2021-01-17 02:18

I am trying some code in Spark (pyspark) for an assignment. First time I use this environment, so for sure I miss something…

I have a simple dataset called c_views.

相关标签:
1条回答
  • 2021-01-17 03:04

    Other simple ways to achieve the result?

    from operator import add 
    
    c_views.reduceByKey(add)
    

    or if you prefer lambda expressions:

    c_views.reduceByKey(lambda x, y: x + y)
    

    I do not understand what exactly I have to code in the function

    It has to be a function which takes two values of the same types as the values in your RDD and returns a value of the same type as inputs. It also has to be associative which means that the final result cannot depend how do you arrange parentheses.

    0 讨论(0)
提交回复
热议问题