Finding the median of an unsorted array

后端 未结 8 1029
后悔当初
后悔当初 2020-11-28 05:01

To find the median of an unsorted array, we can make a min-heap in O(nlogn) time for n elements, and then we can extract one by one n/2 elements to get the median. But this

相关标签:
8条回答
  • 2020-11-28 05:34

    I have already upvoted the @dasblinkenlight answer since the Median of Medians algorithm in fact solves this problem in O(n) time. I only want to add that this problem could be solved in O(n) time by using heaps also. Building a heap could be done in O(n) time by using the bottom-up. Take a look to the following article for a detailed explanation Heap sort

    Supposing that your array has N elements, you have to build two heaps: A MaxHeap that contains the first N/2 elements (or (N/2)+1 if N is odd) and a MinHeap that contains the remaining elements. If N is odd then your median is the maximum element of MaxHeap (O(1) by getting the max). If N is even, then your median is (MaxHeap.max()+MinHeap.min())/2 this takes O(1) also. Thus, the real cost of the whole operation is the heaps building operation which is O(n).

    BTW this MaxHeap/MinHeap algorithm works also when you don't know the number of the array elements beforehand (if you have to resolve the same problem for a stream of integers for e.g). You can see more details about how to resolve this problem in the following article Median Of integer streams

    0 讨论(0)
  • 2020-11-28 05:35

    You can use the Median of Medians algorithm to find median of an unsorted array in linear time.

    0 讨论(0)
  • 2020-11-28 05:44

    The answer is "No, one can't find the median of an arbitrary, unsorted dataset in linear time". The best one can do as a general rule (as far as I know) is Median of Medians (to get a decent start), followed by Quickselect. Ref: [https://en.wikipedia.org/wiki/Median_of_medians][1]

    0 讨论(0)
  • 2020-11-28 05:49

    The quick select algorithm can find the k-th smallest element of an array in linear (O(n)) running time. Here is an implementation in python:

    import random
    
    def partition(L, v):
        smaller = []
        bigger = []
        for val in L:
            if val < v: smaller += [val]
            if val > v: bigger += [val]
        return (smaller, [v], bigger)
    
    def top_k(L, k):
        v = L[random.randrange(len(L))]
        (left, middle, right) = partition(L, v)
        # middle used below (in place of [v]) for clarity
        if len(left) == k:   return left
        if len(left)+1 == k: return left + middle
        if len(left) > k:    return top_k(left, k)
        return left + middle + top_k(right, k - len(left) - len(middle))
    
    def median(L):
        n = len(L)
        l = top_k(L, n / 2 + 1)
        return max(l)
    
    0 讨论(0)
  • 2020-11-28 05:49

    As wikipedia says, Median-of-Medians is theoretically o(N), but it is not used in practice because the overhead of finding "good" pivots makes it too slow.
    http://en.wikipedia.org/wiki/Selection_algorithm

    Here is Java source for a Quickselect algorithm to find the k'th element in an array:

    /**
     * Returns position of k'th largest element of sub-list.
     * 
     * @param list list to search, whose sub-list may be shuffled before
     *            returning
     * @param lo first element of sub-list in list
     * @param hi just after last element of sub-list in list
     * @param k
     * @return position of k'th largest element of (possibly shuffled) sub-list.
     */
    static int select(double[] list, int lo, int hi, int k) {
        int n = hi - lo;
        if (n < 2)
            return lo;
    
        double pivot = list[lo + (k * 7919) % n]; // Pick a random pivot
    
        // Triage list to [<pivot][=pivot][>pivot]
        int nLess = 0, nSame = 0, nMore = 0;
        int lo3 = lo;
        int hi3 = hi;
        while (lo3 < hi3) {
            double e = list[lo3];
            int cmp = compare(e, pivot);
            if (cmp < 0) {
                nLess++;
                lo3++;
            } else if (cmp > 0) {
                swap(list, lo3, --hi3);
                if (nSame > 0)
                    swap(list, hi3, hi3 + nSame);
                nMore++;
            } else {
                nSame++;
                swap(list, lo3, --hi3);
            }
        }
        assert (nSame > 0);
        assert (nLess + nSame + nMore == n);
        assert (list[lo + nLess] == pivot);
        assert (list[hi - nMore - 1] == pivot);
        if (k >= n - nMore)
            return select(list, hi - nMore, hi, k - nLess - nSame);
        else if (k < nLess)
            return select(list, lo, lo + nLess, k);
        return lo + k;
    }
    

    I have not included the source of the compare and swap methods, so it's easy to change the code to work with Object[] instead of double[].

    In practice, you can expect the above code to be o(N).

    0 讨论(0)
  • 2020-11-28 05:50

    It can be done using Quickselect Algorithm in O(n), do refer to Kth order statistics (randomized algorithms).

    0 讨论(0)
提交回复
热议问题