Top n items in a List ( including duplicates )

前端 未结 5 2032
谎友^
谎友^ 2021-01-20 22:18

Trying to find an efficient way to obtain the top N items in a very large list, possibly containing duplicates.

I first tried sorting & slicing, which works. But

5条回答
  •  余生分开走
    2021-01-20 22:24

    Don't overestimate how big log(M) is, for a large list of length M. For a list containing a billion items, log(M) is only 30. So sorting and taking is not such an unreasonable method after all. In fact, sorting an array of integers is far faster thank sorting a list (and the array takes less memory also), so I would say that your best (brief) bet (which is safe for short or empty lists thanks to takeRight)

    val arr = s.toArray
    java.util.Arrays.sort(arr)
    arr.takeRight(N).toList
    

    There are various other approaches one could take, but the implementations are less straightforward. You could use a partial quicksort, but you have the same problems with worst-case scenarios that quicksort does (e.g. if your list is already sorted, a naive algorithm might be O(n^2)!). You could save the top N in a ring buffer (array), but that would require O(log N) binary search every step as well as O(N/4) sliding of elements--only good if N is quite small. More complex methods (like something based upon dual pivot quicksort) are, well, more complex.

    So I recommend that you try array sorting and see if that's fast enough.

    (Answers differ if you're sorting objects instead of numbers, of course, but if your comparison can always be reduced to a number, you can s.map(x => /* convert element to corresponding number*/).toArray and then take the winning scores and run through the list again, counting off the number that you need to take of each score as you find them; it's a bit of bookkeeping, but doesn't slow things down much except for the map.)

提交回复
热议问题