How to find k nearest neighbors to the median of n distinct numbers in O(n) time?

后端 未结 13 1482
别跟我提以往
别跟我提以往 2021-02-02 16:26

I can use the median of medians selection algorithm to find the median in O(n). Also, I know that after the algorithm is done, all the elements to the left of the median are les

相关标签:
13条回答
  • 2021-02-02 16:52

    You already know how to find the median in O(n)

    if the order does not matter, selection of k smallest can be done in O(n) apply for k smallest to the rhs of the median and k largest to the lhs of the median

    from wikipedia

     function findFirstK(list, left, right, k)
     if right > left
         select pivotIndex between left and right
         pivotNewIndex := partition(list, left, right, pivotIndex)
         if pivotNewIndex > k  // new condition
             findFirstK(list, left, pivotNewIndex-1, k)
         if pivotNewIndex < k
             findFirstK(list, pivotNewIndex+1, right, k)
    

    don't forget the special case where k==n return the original list

    0 讨论(0)
  • 2021-02-02 16:58

    If you know the index of the median, which should just be ceil(array.length/2) maybe, then it just should be a process of listing out n(x-k), n(x-k+1), ... , n(x), n(x+1), n(x+2), ... n(x+k) where n is the array, x is the index of the median, and k is the number of neighbours you need.(maybe k/2, if you want total k, not k each side)

    0 讨论(0)
  • 2021-02-02 17:00

    Actually, the answer is pretty simple. All we need to do is to select k elements with the smallest absolute differences from the median moving from m-1 to 0 and m+1 to n-1 when the median is at index m. We select the elements using the same idea we use in merging 2 sorted arrays.

    0 讨论(0)
  • 2021-02-02 17:04

    Four Steps:

    1. Use Median of medians to locate the median of the array - O(n)
    2. Determine the absolute difference between the median and each element in the array and store them in a new array - O(n)
    3. Use Quickselect or Introselect to pick k smallest elements out of the new array - O(k*n)
    4. Retrieve the k nearest neighbors by indexing the original array - O(k)

    When k is small enough, the overall time complexity becomes O(n).

    0 讨论(0)
  • 2021-02-02 17:04
    1. Find the median in O(n). 2. create a new array, each element is the absolute value of the original value subtract the median 3. Find the kth smallest number in O(n) 4. The desired values are the elements whose absolute difference with the median is less than or equal to the kth smallest number in the new array.
    0 讨论(0)
  • 2021-02-02 17:06

    The median-of-medians probably doesn't help much in finding the nearest neighbours, at least for large n. True, you have each column of 5 partitioned around it's median, but this isn't enough ordering information to solve the problem.

    I'd just treat the median as an intermediate result, and treat the nearest neighbours as a priority queue problem...

    Once you have the median from the median-of-medians, keep a note of it's value.

    Run the heapify algorithm on all your data - see Wikipedia - Binary Heap. In comparisons, base the result on the difference relative to that saved median value. The highest priority items are those with the lowest ABS(value - median). This takes O(n).

    The first item in the array is now the median (or a duplicate of it), and the array has heap structure. Use the heap extract algorithm to pull out as many nearest-neighbours as you need. This is O(k log n) for k nearest neighbours.

    So long as k is a constant, you get O(n) median of medians, O(n) heapify and O(log n) extracting, giving O(n) overall.

    0 讨论(0)
提交回复
热议问题