Top n items in a List ( including duplicates )

前端 未结 5 2030
谎友^
谎友^ 2021-01-20 22:18

Trying to find an efficient way to obtain the top N items in a very large list, possibly containing duplicates.

I first tried sorting & slicing, which works. But

5条回答
  •  一生所求
    2021-01-20 22:44

    Here's pseudocode for the algorithm I'd use:

    selectLargest(n: Int, xs: List): List
      if size(xs) <= n
         return xs
      pivot <- selectPivot(xs)
      (lt, gt) <- partition(xs, pivot)
      if size(gt) == n
         return gt
      if size(gt) < n
         return append(gt, selectLargest(n - size(gt), lt))
      if size(gt) > n
         return selectLargest(n, gt)
    

    selectPivot would use some technique to select a "pivot" value for partitioning the list. partition would split the list into two: lt (elements smaller than the pivot) and gt (elements greater than the pivot). Of course, you'd need to throw elements equal to the pivot in one of those groups, or else handle that group separately. It doesn't make a big difference, as long as you remember to handle that case somehow.

    Feel free to edit this answer, or post your own answer, with a Scala implementation of this algorithm.

提交回复
热议问题