Find top N elements in an Array

前端 未结 12 1120
南笙
南笙 2020-11-28 06:18

What would be the best solution to find top N (say 10) elements in an unordered list (of say 100).

The solution which came in my head was to 1. sort it using quick s

相关标签:
12条回答
  • 2020-11-28 06:37

    Well, you can create a heap from an unsorted array in O(n) time, and you can get the top element from the heap in O(log(n)) time. So your total runtime is O(n + k*log(n)).

    0 讨论(0)
  • 2020-11-28 06:38

    If you're dealing with simple elements like fixed-length integers, then provided you can spare a memory buffer of the same size as the input data, sorting can be done in O(n) time using bucket or radix sorts, and this will be the fastest.

    Although there are linear-time selection algorithms, the hidden constant is very high -- around 24. That means an O(nlog n) algorithm will be typically faster for fewer than several million elements.

    Otherwise, in the general case when you can only compare 2 elements and determine which is greater, the problem is best solved by a heap data structure.

    Suppose you want the top k of n items. All solutions based on fully sorting the data require O(nlog n) time, while using a heap requires only O(nlog k) time -- just build a heap on the first k elements, then keep adding an element and removing the maximum. This will leave you with a heap containing the smallest k elements.

    0 讨论(0)
  • 2020-11-28 06:39

    The time could be reduced to linear time:

    1. Use the selection algorithm, which effectively find the k-th element in a un-sorted array in linear time. You can either use a variant of quick sort or more robust algorithms.

    2. Get the top k using the pivot got in step 1.

    0 讨论(0)
  • 2020-11-28 06:39

    The best Algorithm would by large depend on the size of K. If K is small then by simply following BubbleSort Algorithm and iterating the outer loop K times would give the top K values. The complexity will be O(n*k).

    However for values of K close to n the complexity will approach O(n^2). In such scenario quicksort might be a good alternative.

    0 讨论(0)
  • 2020-11-28 06:45

    I was asked for the same algorithm on the interview. I done that, if somebody can compare that with fastest algorithm in Java - will be very useful.

        public int[] findTopNValues(int[] anyOldOrderValues, int n) {
            if (n < 0) {
                return new int[]{};
            }
            if (n == 1) {
                return new int[]{findMaxValue(anyOldOrderValues)};
            }
    
            int[] result = new int[n + 1];
            for (int i = 0; i < Math.min(n, anyOldOrderValues.length); i++) {
                result[i] = anyOldOrderValues[i];
            }
            Arrays.sort(result);
    
            int max = result[0];
            for (int i = n - 1; i < anyOldOrderValues.length; i++) {
                int value = anyOldOrderValues[i];
                if (max < value) {
                    result[n] = value;
                    Arrays.sort(result);
                    int[] result1 = new int[n + 1];
                    System.arraycopy(result, 1, result1, 0, n);
                    result = result1;
                    max = result[0];
                }
            }
            return convertAndFlip(result, n);
        }
    
        public static int[] convertAndFlip(int[] integers, int n) {
            int[] result = new int[n];
            int j = 0;
            for (int i = n - 1; i > -1; i--) {
                result[j++] = integers[i];
            }
            return result;
        }
    

    and test for that:

    public void testFindTopNValues() throws Exception {
        final int N = 100000000;
        final int MAX_VALUE = 100000000;
        final int returnArray = 1000;
        final int repeatTimes = 5;
    
        FindTopValuesArraySorting arraySorting = new FindTopValuesArraySorting();
    
        int[] randomArray = createRandomArray(N, MAX_VALUE);
        for (int i = 0; i < repeatTimes; i++) {
    
            long start = System.currentTimeMillis();
            int[] topNValues = arraySorting.findTopNValues(randomArray, returnArray);
            long stop = System.currentTimeMillis();
    
            System.out.println("findTopNValues() from " + N + " elements, where MAX value=" + (MAX_VALUE - 1) + " and return array size " + returnArray + " elements : " + (stop - start) + "msec");
            // System.out.println("Result list = " + Arrays.toString(topNValues));
        }
    }
    
    private static int[] createRandomArray(int n, int maxValue) {
        Random r = new Random();
        int[] arr = new int[n];
        for (int i = 0; i < n; i++) {
            arr[i] = r.nextInt(maxValue);
        }
        return arr;
    }
    

    Result is something like:

    findTopNValues() from 100000000 elements, where MAX value=99999999 and return array size 1000 elements : 395msec
    findTopNValues() from 100000000 elements, where MAX value=99999999 and return array size 1000 elements : 311msec
    findTopNValues() from 100000000 elements, where MAX value=99999999 and return array size 1000 elements : 473msec
    findTopNValues() from 100000000 elements, where MAX value=99999999 and return array size 1000 elements : 380msec
    findTopNValues() from 100000000 elements, where MAX value=99999999 and return array size 1000 elements : 406msec
    

    ~400msc average result, for getting 1000 max integers from array of 100.000.000 initial elements. not bad!

    Just tried that set from above:

    findTopNValues() from 101 elements and return array size 10 elements : 1msec
    Result list = [998, 986, 986, 986, 947, 944, 926, 924, 921, 902]
    Original list = [403, 459, 646, 467, 120, 346, 430, 247, 68, 312, 701, 304, 707, 443, 753, 433, 986, 921, 513, 634, 861, 741, 482, 794, 679, 409, 145, 93, 512, 947, 19, 9, 385, 208, 795, 742, 851, 638, 924, 637, 638, 141, 382, 89, 998, 713, 210, 732, 784, 67, 273, 628, 187, 902, 42, 25, 747, 471, 686, 504, 255, 74, 638, 610, 227, 892, 156, 86, 48, 133, 63, 234, 639, 899, 815, 986, 750, 177, 413, 581, 899, 494, 292, 359, 60, 106, 944, 926, 257, 370, 310, 726, 393, 800, 986, 827, 856, 835, 66, 183, 901]
    
    0 讨论(0)
  • 2020-11-28 06:46

    You can use List and can guava's Comparators class to get the desired results. It is a highly optimized solution. Please see a sample below, which gets top 5 numbers. Api can be found here.

    import java.util.Comparator;
    import java.util.List;
    import java.util.stream.Collector;
    
    import org.junit.Test;
    
    import com.google.common.collect.Comparators;
    import com.google.common.collect.Lists;
    
    public class TestComparator {
    
        @Test
        public void testTopN() {
            final List<Integer> numbers = Lists.newArrayList(1, 3, 8, 2, 6, 4, 7, 5, 9, 0);
            final Collector<Integer, ?, List<Integer>> collector = Comparators.greatest(5,
                    Comparator.<Integer>naturalOrder());
            final List<Integer> top = numbers.stream().collect(collector);
            System.out.println(top);
        }
    
    }
    

    Output: [9, 8, 7, 6, 5]

    0 讨论(0)
提交回复
热议问题