Return RDD of largest N values from another RDD in SPARK

后端 未结 2 757
粉色の甜心
粉色の甜心 2021-01-14 12:08

I\'m trying to filter an RDD of tuples to return the largest N tuples based on key values. I need the return format to be an RDD.

So the RDD:

[(4, \'         


        
2条回答
  •  广开言路
    2021-01-14 12:45

    A less effort approach since you only want to convert take(N) results to new RDD.

    sc.parallelize(yourSortedRdd.take(Nth))
    

提交回复
热议问题