Trying to find an efficient way to obtain the top N items in a very large list, possibly containing duplicates.
I first tried sorting & slicing, which works. But
I wanted a version that was polymorphic, and also allowed to compose using a single iterator. For instance, what if you wanted the top largest and smallest elements when reading from a file? Here is what I came up with:
import util.Sorting.quickSort
class TopNSet[T](n:Int) (implicit ev: Ordering[T], ev2: ClassManifest[T]){
val ss = new Array[T](n)
var len = 0
def tryElement(el:T) = {
if(len < n-1){
ss(len) = el
len += 1
}
else if(len == n-1){
ss(len) = el
len = n
quickSort(ss)
}
else if(ev.gt(el, ss(0))){
ss(0) = el
quickSort(ss)
}
}
def getTop() = {
ss.slice(0,len)
}
}
Evaluating compared to the accepted answer:
val myInts = Array.fill(100000000)(util.Random.nextInt)
time(topNs(myInts,100)
//Elapsed time 3006.05485 msecs
val myTopSet = new TopNSet[In](100)
time(myInts.foreach(myTopSet.tryElement(_)))
//Elapsed time 4334.888546 msecs
So, not much slower, and certainly a lot more flexible