Order by value in spark pair RDD

后端 未结 2 1450
自闭症患者
自闭症患者 2021-02-18 16:32

I have a spark pair RDD (key, count) as below

Array[(String, Int)] = Array((a,1), (b,2), (c,1), (d,3))

Using spark scala API how to get a new p

相关标签:
2条回答
  • 2021-02-18 17:10

    This should work:

    //Assuming the pair's second type has an Ordering, which is the case for Int
    rdd.sortBy(_._2) // same as rdd.sortBy(pair => pair._2)
    

    (Though you might want to take the key to account too when there are ties.)

    0 讨论(0)
  • 2021-02-18 17:15

    Sort by key and value in ascending and descending order

    val textfile = sc.textFile("file:///home/hdfs/input.txt")
    val words = textfile.flatMap(line => line.split(" "))
    //Sort by value in descending order. For ascending order remove 'false' argument from sortBy
    words.map( word => (word,1)).reduceByKey((a,b) => a+b).sortBy(_._2,false)
    //for ascending order by value
    words.map( word => (word,1)).reduceByKey((a,b) => a+b).sortBy(_._2)
    
    //Sort by key in ascending order
    words.map( word => (word,1)).reduceByKey((a,b) => a+b).sortByKey
    //Sort by key in descending order
    words.map( word => (word,1)).reduceByKey((a,b) => a+b).sortByKey(false)
    

    This can be done in another way by applying sortByKey after swapping the key and value

    //Sort By value by swapping key and value and then using sortByKey
    val sortbyvalue = words.map( word => (word,1)).reduceByKey((a,b) => a+b)
    val descendingSortByvalue = sortbyvalue.map(x => (x._2,x._1)).sortByKey(false)
    descendingSortByvalue.toDF.show
    descendingSortByvalue.foreach {n => {
    val word=  n._1
    val count = n._2
    println(s"$word:$count")}}
    
    0 讨论(0)
提交回复
热议问题