Finding the farthest point in one set from another set

前端 未结 6 1723
旧巷少年郎
旧巷少年郎 2021-02-06 12:05

My goal is a more efficient implementation of the algorithm posed in this question.

Consider two sets of points (in N-space. 3-space for the example case of RGB colorsp

相关标签:
6条回答
  • 2021-02-06 12:14

    The most obvious approach seems to me to be to build a tree structure on one set to allow you to search it relatively quickly. A kd-tree or similar would probably be appropriate for that.

    Having done that, you walk over all the points in the other set and use the tree to find their nearest neighbour in the first set, keeping track of the maximum as you go.

    It's nlog(n) to build the tree, and log(n) for one search so the whole thing should run in nlog(n).

    0 讨论(0)
  • 2021-02-06 12:22

    Maybe I'm misunderstanding the question, but wouldn't it be easiest to just reverse the sign on all the coordinates in one data set (i.e. multiply one set of coordinates by -1), then find the first nearest neighbour (which would be the farthest neighbour)? You can use your favourite knn algorithm with k=1.

    0 讨论(0)
  • 2021-02-06 12:22

    EDIT: I meant nlog(n) where n is the sum of the sizes of both sets.

    In the 1-Space set I you could do something like this (pseudocode)

    Use a structure like this

    Struct Item {
        int value
        int setid
    }
    

    (1) Max Distance = 0
    (2) Read all the sets into Item structures
    (3) Create an Array of pointers to all the Items
    (4) Sort the array of pointers by Item->value field of the structure
    (5) Walk the array from beginning to end, checking if the Item->setid is different from the previous Item->setid if (SetIDs are different)
    check if this distance is greater than Max Distance if so set MaxDistance to this distance

    Return the max distance.

    0 讨论(0)
  • 2021-02-06 12:26

    First you need to find every element's nearest neighbor in the other set.

    To do this efficiently you need a nearest neighbor algorithm. Personally I would implement a kd-tree just because I've done it in the past in my algorithm class and it was fairly straightforward. Another viable alternative is an R-tree.

    Do this once for each element in the smallest set. (Add one element from the smallest to larger one and run the algorithm to find its nearest neighbor.)

    From this you should be able to get a list of nearest neighbors for each element.

    While finding the pairs of nearest neighbors, keep them in a sorted data structure which has a fast addition method and a fast getMax method, such as a heap, sorted by Euclidean distance.

    Then, once you're done simply ask the heap for the max.

    The run time for this breaks down as follows:

    N = size of smaller set
    M = size of the larger set

    • N * O(log M + 1) for all the kd-tree nearest neighbor checks.
    • N * O(1) for calculating the Euclidean distance before adding it to the heap.
    • N * O(log N) for adding the pairs into the heap.
    • O(1) to get the final answer :D

    So in the end the whole algorithm is O(N*log M).

    If you don't care about the order of each pair you can save a bit of time and space by only keeping the max found so far.

    *Disclaimer: This all assumes you won't be using an enormously high number of dimensions and that your elements follow a mostly random distribution.

    0 讨论(0)
  • 2021-02-06 12:34

    For each point in set B, find the distance to its nearest neighbor in set A.

    To find the distance to each nearest neighbor, you can use a kd-tree as long as the number of dimensions is reasonable, there aren't too many points, and you will be doing many queries - otherwise it will be too expensive to build the tree to be worthwhile.

    0 讨论(0)
  • 2021-02-06 12:41

    To make things more efficient, consider using a Pigeonhole algorithm - group the points in your reference set (your colorTable) by their location in n-space. This allows you to efficiently find the nearest neighbour without having to iterate all the points.

    For example, if you were working in 2-space, divide your plane into a 5 x 5 grid, giving 25 squares, with 25 groups of points.

    In 3 space, divide your cube into a 5 x 5 x 5 grid, giving 125 cubes, each with a set of points.

    Then, to test point n, find the square/cube/group that contains n and test distance to those points. You only need to test points from neighbouring groups if point n is closer to the edge than to the nearest neighbour in the group.

    0 讨论(0)
提交回复
热议问题