How to Sum values of Column Within RDD

后端 未结 1 1194
一向
一向 2021-01-24 14:00

I have an RDD with the following rows:

[(id,value)]

How would you sum the values of all rows in the RDD?

相关标签:
1条回答
  • 2021-01-24 14:26

    Simply use sum, you just need to get the data into a list.

    For example

    sc.parallelize([('id', [1, 2, 3]), ('id2', [3, 4, 5])]) \ 
        .flatMap(lambda tup: tup[1]) \ # [1, 2, 3, 3, 4, 5]
        .sum()
    

    Outputs 18

    Similarly, just use values() to get that second column as an RDD on it's own.

    sc.parallelize([('id', 6), ('id2', 12)]) \ 
        .values() \ # [6, 12]
        .sum()
    
    0 讨论(0)
提交回复
热议问题