问题
I have an RDD, and I would like to sum a part of the list.
(key, element2 + element3)
(1, List(2.0, 3.0, 4.0, 5.0)), (2, List(1.0, -1.0, -2.0, -3.0))
output should look like this,
(1, 7.0), (2, -3.0)
Thanks
回答1:
You can map
and indexing on the second part:
yourRddOfTuples.map(tuple => {val list = tuple._2; list(1) + list(2)})
Update after your comment, convert it to Vector
:
yourRddOfTuples.map(tuple => {val vs = tuple._2.toVector; vs(1) + vs(2)})
Or if you do not want to use conversions:
yourRddOfTuples.map(_._2.drop(1).take(2).sum)
This skips the first element (.drop(1)
) from the second element of the tuple (.map(_._2
), takes the next two (.take(2)
) (might be less if you have less) and sums them (.sum
).
回答2:
You can map
the key-list pair to obtain the 2nd and 3rd list elements as follows:
val rdd = sc.parallelize(Seq(
(1, List(2.0, 3.0, 4.0, 5.0)),
(2, List(1.0, -1.0, -2.0, -3.0))
))
rdd.map{ case (k, l) => (k, l(1) + l(2)) }.collect
// res1: Array[(Int, Double)] = Array((1,7.0), (2,-3.0))
来源:https://stackoverflow.com/questions/47949324/how-to-sum-a-part-of-a-list-in-rdd