Find closest line to each point on big dataset, possibly using shapely and rtree

☆樱花仙子☆ 提交于 2019-12-03 16:59:34

Here you have a solution using rtree library. The idea is to build boxes that contain the segments in the diagonal, and use that boxes to build the rtree. This will be the most time-expensive operation. Later, you query the rtree with a box centered in the point. You get several hits that you need to check for the minimum, but the number of hits will be (hopefuly) orders of magnitud lower than checking against all the segments.

In the solutions dict you will get, for each point, the line id, the nearest segment, the nearest point (a point of the segment), and the distance to the point.

There are some comments in the code to help you. Take into account you can serialize the rtree for later use. In fact, I would recomend to build the rtree, save it, and then use it. Because the exceptions for the adjustments of the constants MIN_SIZE and INFTY will probably raise, and you would not want to lose all the computation you did building the rtree.

A too small MIN_SIZE will mean you could have errors in the solutions because if the box around the point does not intersect a segment, it could intersect a box of a segment that is not the nearest segment (it is easy to think a case).

A too big MIN_SIZE would mean to have too much false positives, that in the extreme case would make the code to try with all the segments, and you will be in the same position as before, or worst, because you are now building an rtree you don't really use.

If the data is real data from a city, I imagine you know that any address will be intersecting a segment with a distance smaller than a few blocks. That will make the search practically logaritmic.

One more comment. I'm assuming that there are not segments that are too large. Since we are using the segments as the diagonals of the boxes in the rtree, if you have some large segments in a line, this would mean a huge box will be assigned to that segment, and all addresses boxes would intersect it. To avoid this, you can always increase artificially the resolution of the LineStrins by adding more intermediate points.

import math
from rtree import index
from shapely.geometry import Polygon, LineString

INFTY = 1000000
MIN_SIZE = .8
# MIN_SIZE should be a vaule such that if you build a box centered in each 
# point with edges of size 2*MIN_SIZE, you know a priori that at least one 
# segment is intersected with the box. Otherwise, you could get an inexact 
# solution, there is an exception checking this, though.


def distance(a, b):
    return math.sqrt( (a[0]-b[0])**2 + (a[1]-b[1])**2 ) 

def get_distance(apoint, segment):
    a = apoint
    b, c = segment
    # t = <a-b, c-b>/|c-b|**2
    # because p(a) = t*(c-b)+b is the ortogonal projection of vector a 
    # over the rectline that includes the points b and c. 
    t = (a[0]-b[0])*(c[0]-b[0]) + (a[1]-b[1])*(c[1]-b[1])
    t = t / ( (c[0]-b[0])**2 + (c[1]-b[1])**2 )
    # Only if t 0 <= t <= 1 the projection is in the interior of 
    # segment b-c, and it is the point that minimize the distance 
    # (by pitagoras theorem).
    if 0 < t < 1:
        pcoords = (t*(c[0]-b[0])+b[0], t*(c[1]-b[1])+b[1])
        dmin = distance(a, pcoords)
        return pcoords, dmin
    elif t <= 0:
        return b, distance(a, b)
    elif 1 <= t:
        return c, distance(a, c)

def get_rtree(lines):
    def generate_items():
        sindx = 0
        for lid, l in lines:
            for i in xrange(len(l)-1):
                a, b = l[i]
                c, d = l[i+1]
                segment = ((a,b), (c,d))
                box = (min(a, c), min(b,d), max(a, c), max(b,d)) 
                #box = left, bottom, right, top
                yield (sindx, box, (lid, segment))
                sindx += 1
    return index.Index(generate_items())

def get_solution(idx, points):
    result = {}
    for p in points:
        pbox = (p[0]-MIN_SIZE, p[1]-MIN_SIZE, p[0]+MIN_SIZE, p[1]+MIN_SIZE)
        hits = idx.intersection(pbox, objects='raw')    
        d = INFTY
        s = None
        for h in hits: 
            nearest_p, new_d = get_distance(p, h[1])
            if d >= new_d:
                d = new_d
                s = (h[0], h[1], nearest_p, new_d)
        result[p] = s
        print s

        #some checking you could remove after you adjust the constants
        if s == None:
            raise Exception("It seems INFTY is not big enough.")

        pboxpol = ( (pbox[0], pbox[1]), (pbox[2], pbox[1]), 
                    (pbox[2], pbox[3]), (pbox[0], pbox[3]) )
        if not Polygon(pboxpol).intersects(LineString(s[1])):  
            msg = "It seems MIN_SIZE is not big enough. "
            msg += "You could get inexact solutions if remove this exception."
            raise Exception(msg)

    return result

I tested the functions with this example.

xcoords = [i*10.0/float(1000) for i in xrange(1000)]
l1 = [(x, math.sin(x)) for x in xcoords]
l2 = [(x, math.cos(x)) for x in xcoords]
points = [(i*10.0/float(50), 0.8) for i in xrange(50)]

lines = [('l1', l1), ('l2', l2)]

idx = get_rtree(lines)

solutions = get_solution(idx, points)

And got:

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!