Sum values of PairRDD

后端 未结 2 1982
伪装坚强ぢ
伪装坚强ぢ 2021-01-01 05:56

I have an RDD of type:

dataset :org.apache.spark.rdd.RDD[(String, Double)] = MapPartitionRDD[26]

Which is equivalent to (Pedro, 0.083

相关标签:
2条回答
  • 2021-01-01 06:02

    like this?:

    map(_._2).reduce((x, y) => x + y)
    

    breakdown: map the tuple to just the double values, then reduce the RDD by summing.

    0 讨论(0)
  • 2021-01-01 06:08

    Considering your input data, you can do the following :

    // example
    val datasets = sc.parallelize(List(("Pedro", 0.0833), ("Hello", 0.001828))) 
    datasets.map(_._2).sum()
    // res3: Double = 0.085128
    // or
    datasets.map(_._2).reduce(_ + _)
    // res4: Double = 0.085128
    // or even
    datasets.values.sum()
    // res5: Double = 0.085128
    
    0 讨论(0)
提交回复
热议问题