I\'ve got a large set of data points (100,000+) stored in a 2-dimensional numpy array (1st column: x coordinates, 2nd column: y coordinates). I\'ve also got several 1-dimensiona
I'm afraid I'm not familiar with the libraries you are using, but I think I have a reasonable idea for the algorithm you could use and I'll just run through how I would implement that with vanilla python and then I'm sure you can improve it and implement it using these libraries. Also, I am not claiming that this is the best way to achieve this, but I wanted to get my response in reasonably quickly so here goes.
Now, the idea comes from the use of the cross-product of two vectors in algorithms for finding the convex set of a set of points e.g. Graham's Scan. Say we have two points p1 and p2, which define the point vectors p1 and p2, from the origin (0,0) to (x1, y1) and (x2, y2) respectively. The cross product of p1 x p2 gives a third vector p3 which is perpendicular to both p1 and p2 and has magnitude given by the area of the parallelogram bounded by the vectors.
A very useful result is that the determinant of the matrix
/ x1, x2 \
\ y1, y2 /
...which is x1*y2 - x2*y1 gives the magnitude of the vector p3 and the sign indicates whether p3 is "coming out of" the plane or "going into" it. The crucial point here is that if this magnitude is positive then p2 is "to the left" of p1 and if it's negative then p2 is "to the right" of p1.
Hopefully this ascii art example will help:
. p2(4, 5)
/
/
/
/_ _ _ _ _. p1(5, 0)
x1*y2 - x2*y1 = 5*4 - 0*5 = 20 and so p2 is "to the left" of p1
Finally on to why this is useful for us! If we have a list of vertices of the polygon and a set of the other points in the graph, then for each edge of the polygon we can get the vector of that edge. We can also get the vectors joining the starting vertex to all the other points in the graph and by testing whether these lie to the left or right of the edge we can eliminate some points for each edge. All those not removed by the end of the process are those points inside the polygon. Anyway, on to some code to make some more sense of this!
Get a list of vertices of your polygon in the order you would visit them if you were drawing them in a counter-clockwise direction, for example some pentagon might be:
poly = [(1, 1), (4, 2), (5, 5), (3, 8), (0, 4)]
Get a set containing all the other points in the graph, we will gradually remove invalid points from this set until those left at the end of the process are exactly those points that are inside the polygon.
points = set(['(3, 0), (10, -2), (3,3), ...])
The main bit of code itself is actually pretty compact for how long it's taken me to write about how it works. to_right
takes two tuples representing vectors and returns True
if v2
lies to the right of v1
. The loops then just go through all the edges of the polygon and removes points from the working set if they are to the right of any of the edges.
def to_right(v1, v2):
return (v1[0]*v2[1] - v1[1]*v2[0]) < 0
for i in range(len(poly)):
v1 = poly[i-1]
v2 = poly[i]
for p in points:
if(to_right(v2-v1, p-v1)):
points.remove(p)
edit: To clarify, the fact that they are removed if they are to the right as opposed to the left is linked to the order in which the vertices of the polygon are specified. If they were in a clockwise order you would want to eliminate left-lying points instead. I don't have a particularly great solution to this issue at the moment.
Anyway, hopefully I'm right about this stuff and it is of some help to someone even if not the OP. The asymptotic complexity of this algorithm is O(mn) where n is the number of points in the graph and m is the number of vertices of the polygon as in the worst case all the points lie inside the polygon and we have to check every point for every edge, with none ever being removed.