Calculate average using Spark Scala

后端 未结 4 1407
爱一瞬间的悲伤
爱一瞬间的悲伤 2021-01-28 09:29

How do I calculate the Average salary per location in Spark Scala with below two data sets ?

File1.csv(Column 4 is salary)

Ram, 30, Engineer, 40000  
B         


        
4条回答
  •  遥遥无期
    2021-01-28 10:17

    You could do something like this:

    val salary = sc.textFile("File1.csv").map(_.split(",").map(_.trim))
    val location = sc.textFile("File2.csv").map(_.split(",").map(_.trim))
    val joined = salary.map(e=>(e(0),e(3).toInt)).join(location.map(e=>(e(0),e(1))))
    val locSalary = joined.map(v => (v._2._2, v._2._1))
    val averages = locSalary.aggregateByKey((0,0))((t,e) => (t._1 + 1, t._2 + e),
            (t1,t2) => (t1._1 + t2._1, t1._2 + t2._2)).mapValues(t => t._2/t._1)
    

    then averages.take(10) will give:

    res5: Array[(String, Int)] = Array((Chennai,50000), (Bangalore,40000))
    

提交回复
热议问题