I\'m looking for an algorithm for finding largest subset of points (by largest i mean in number) that form a convex polygon from the given set of point. I think this might be so
This is my a Dynamic Programming O(N^4) algorithm with idea from Andrew's Algorithm posted by Nuclearman, i'm still looking for a O(N^3) algorithm
upper_hull(most left point, previous point, current point)
{
ret = 0
if (current point != most left point) /* at anytime we can decide to end the upper hull and build lower_hull from current point ending at most left point */
ret = min(ret,lower_hull(most left point, -1, current point))
for all point on the right of current point /* here we try all possible next point that still satisfy the condition of convex polygon */
if (cross(previous point,current point,next point) >= 0) max(ret,1+upper_hull(most left point, current point, next point))
return ret;
}
lower_hull(most left point, previous point, current point)
{
if (current point == most left point) return 0;
ret = -INF /* it might be impossible to build a convex hull here, so if it does it will return -infinity */
for all point on the left of current point and not on the left of most left point
if (cross(previous point,current point,next point) >= 0) max(ret,1+lower_hull(most left point, current point, next point))
return ret;
}
First sort the point based on x axis then for tie sort by y axis, then try all point as most left point to run the upper_hull(p,-1,p) , please tell me if there's any flaw in this algorithm
You can use a delaunay triangulation and remove the longest edge and also the vertex. I use a similar algorithm to find the concave hull. You can find jan example based on population data @ http://www.phpdevpad.de/geofence. I have also wrote a php class concave-hull @ phpclasses.org.
This problem has been already published in these competitions:
And its solution (O(N3) algorithm) could be found on this page: "USACO DEC08 Problem 'fence' Analysis" by Bruce Merry and Jacob Steinhardt.
The following is an attempt to explain this algorithm. Also here is my implementation of this algorithm in C++11. This code works for N<=250 and integer coordinates in range 0 .. 1023. No three points should be on the same line. Here is Python script that generates test data satisfying these requirements.
Simplified problem: find largest subset of points that form a convex polygon and contain the leftmost point of given set (or if there are several leftmost points, this polygon should contain topmost of them).
To solve this problem we could connect each pair of points by directed line segment, then (implicitly) treat each segment as a graph node as shown in diagram:
Here parent node and corresponding segment have blue color. Line segment corresponding to its left child (red color) starts at the end of "parent" segment and it is the line segment making least possible left turn relative to the direction of "parent" segment. Line segment corresponding to its right child (green color) starts at the start of "parent" segment and it is also the line segment making least possible left turn relative to the direction of "parent" segment.
Any segment ending at leftmost point corresponds to a "leaf" node, i.e. it has no child nodes. Such construction of the graph guarantees that there are no cycles, in other words this graph is a DAG.
Now to find largest convex subset of points it is enough to find the longest path in this graph. And the longest path in DAG could be found with dynamic programming algorithm in time O(E), where E is number of edges in the graph. Since each node has only 2 outgoing edges, each corresponding to a pair of points, this problem could be solved in O(N2) time.
All this is possible to implement if we preprocess input data, sorting directed line segments starting at the same point by angle. This allows to perform depth-first traversal in the graph. We should memorize the longest path starting from each processed node, so that each graph edge is visited at most once, and we have O(E) = O(N2) time algorithm (ignoring preprocessing time for now). Space requirements are also O(N2) because we need to store sorted directions for each pair of points and memorization uses one value per node (which is also a pair of points).
Here is C++ implementation of this dynamic programming algorithm:
unsigned dpStep(ind_t i_left, ind_t from, ind_t to_ind)
{
ind_t child = sorted[from][to_ind];
while (child < i_left)
child = sorted[from][++to_ind];
if (child == i_left)
return 1;
if (auto v = memorize[from][child])
return v;
const auto x = max(dpStep(i_left, child, lefts[from][child]) + 1,
dpStep(i_left, from, static_cast<ind_t>(to_ind + 1)));
memorize[from][child] = static_cast<ind_t>(x);
return x;
}
Input parameters are the leftmost point and a pair of points forming current line segment (where starting point of the segment is given directly but ending point is given as an index in sorted by angle array of points). The word "left" in this code fragment is slightly overused: it means both the leftmost point (i_left
) and preprocessed array containing left childs for the graph (lefts[][]
).
General problem (where the leftmost point is not fixed) could be solved this way:
Step 4 is optional. It does not improve O(N3) time complexity. But it slightly improves speed of DP algorithm by excluding unneeded points. In my tests this gives 33% speed boost.
There are several alternatives for preprocessing. We could compute every angle (with atan2
) and sort them, as it is done in sample code by Bruce Merry and Jacob Steinhardt. This is a simplest way but atan2
may be too expensive, especially for smaller point sets.
Or we could exclude atan2
and sort tangents instead of angles, as it is done in my implementation. This is a little bit more complicated but faster.
Both these alternatives need O(N2 log N) time (unless we use some distribution sorting). Third alternative is to use well known computational geometry methods (arrangements and duality) to get O(N2) preprocessing. But I don't think it is useful for our problem: it is probably too slow for smaller point sets because of large constant factor, as for larger point sets, it might give some speed improvement, but too insignificant because step 3 will greatly outweigh step 2. Also it is much more difficult to implement.
Some other results: I tried to implement iterative DP instead of recursion; this did not noticeably change the speed. Also I tried to perform two DP searches at once, interleaving steps of each search (something similar to fibers or HyperThreading implemented in software); this might help because DP consists mostly of memory accesses having high latency and preventing full utilization of CPU throughput; the results are not very impressive, only about 11% speed boost, most likely with real HyperThreading it would be much faster.
I think I came up with an algorithm for it by extending Andrew's algorithm for convex hulls.
Originally, I came up with a O(N^4) algorithm, but noticed was way over-complicating it and brought it down to O(N^2) algorithm. It seems like there maybe a bug in the code somewhere that causes issues in at least the case of a vertical line of points. It might be a special case (requiring changing a few lines of code), or a sign of a larger flaw in the algorithm. If it's the latter, then I'm inclined to say it's NP-hard, and offer the algorithm as a heuristic. I've spent all the time I care to coding and debugging it. In any case it seems to work fine in other cases. However, it's difficult to test for correctness when the correct algorithms seem to be O(2^N).
def maximal_convex_hull(points):
# Expects a list of 2D points in the format (x,y)
points = sorted(set(points)) # Remove duplicates and sort
if len(points) <= 1: # End early
return points
def cross(o, a, b): # Cross product
return (a[0] - o[0]) * (b[1] - o[1]) - (a[1] - o[1]) * (b[0] - o[0])
# Use a queue to extend Andrew's algorithm in the following ways:
# 1. Start a new convex hull for each successive point
# 2. Keep track of the largest convex hull along the way
Q = []
size = 0
maximal_hull = None
for i,p in enumerate(points):
remaining = len(points) - i + 1
new_Q = []
for lower, upper in Q: # Go through the queue
# Build upper and lower hull similtanously, slightly less efficent
while len(lower) >= 2 and cross(lower[-2], lower[-1], p) < 0:
lower.pop()
lower.append(p)
while len(upper) >= 2 and cross(upper[-2], upper[-1], p) > 0:
upper.pop()
upper.append(p)
new_size = len(lower) + len(upper)
if new_size > size: # Check for a new highest maximal convex hull
size = new_size
maximal_hull = (lower, upper)
# There is an odd bug? that when one or both if statements are removed
# xqg237.tsp (TSPLIB) gives slightly different solutions and is
# missing a number of points it should have.
# Seems like if the opposite should be true if anything since they
# were intended to be easy optimizations not required code.
if remaining + new_size >= size: # Don't bother if not enough
new_Q.append((lower, upper)) # Add an updated convex hulls
if remaining > size: # Don't bother if not enough
new_Q.append(([p], [p])) # Add a new convex hull
Q = new_Q # Update to the new queue
# Extract and merge the lower and upper to a maximium convex hull
# Only one of the last points is ommited because it is repeated
# Asserts and related variables used for testing
# Could just have "return lower[:-1] + list(reversed(upper[:-1]))[:-1]"
lower, upper = maximal_hull
max_hull_set = set(lower) | set(lower) | set(upper)
lower = lower
upper = list(reversed(upper[:-1]))[:-1]
max_convex_hull = lower + upper
assert len(lower) + len(upper) == len(max_hull_set)
assert max_hull_set == set(max_convex_hull)
return max_convex_hull
Edit: This algorithm isn't correct, but it served as inspiration for the correct algorithm and thus I'm leaving it here.