Top n items in a List ( including duplicates )

前端 未结 5 2033
谎友^
谎友^ 2021-01-20 22:18

Trying to find an efficient way to obtain the top N items in a very large list, possibly containing duplicates.

I first tried sorting & slicing, which works. But

5条回答
  •  时光取名叫无心
    2021-01-20 22:28

    Unless I'm missing something, why not just traverse the list and pick the top 20 as you go? So long as you keep track of the smallest element of the top 20 there should be no overhead except when adding to the top 20, which should be relatively rare for a long list. Here's an implementation:

      def topNs(xs: TraversableOnce[Int], n: Int) = {
        var ss = List[Int]()
        var min = Int.MaxValue
        var len = 0
        xs foreach { e =>
          if (len < n || e > min) {
            ss = (e :: ss).sorted
            min = ss.head
            len += 1
          }
          if (len > n) {
            ss = ss.tail
            min = ss.head
            len -= 1
          }                    
        }
        ss
      }  
    

    (edited because I originally used a SortedSet not realising you wanted to keep duplicates.)

    I benchmarked this for a list of 100k random Ints, and it took on average 40 ms. Your elite method takes about 850 ms and and your elite2 method takes about 4100 ms. So this is over 20 x quicker than your fastest.

提交回复
热议问题