Replace groupByKey() with reduceByKey()

后端 未结 1 1746
一整个雨季
一整个雨季 2020-12-17 06:44

This is a follow up question from here. I am trying to implement k-means based on this implementation. It works great, but I would like to replace groupByKey(

相关标签:
1条回答
  • 2020-12-17 07:09

    You could use an aggregateByKey() (a bit more natural than reduceByKey()) like this to compute newCentroids:

    val newCentroids = closest.aggregateByKey((Vector.zeros(dim), 0L))(
      (agg, v) => (agg._1 += v, agg._2 + 1L),
      (agg1, agg2) => (agg1._1 += agg2._1, agg1._2 + agg2._2)
    ).mapValues(agg => agg._1/agg._2).collectAsMap 
    

    For this to work you will need to compute the dimensionality of your data, i.e. dim, but you only need to do this once. You could probably use something like val dim = data.first._2.length.

    0 讨论(0)
提交回复
热议问题