Remove 2D points from rtree in Python

假装没事ソ 提交于 2019-12-24 02:23:41

问题


I'm trying to store 2D points in an rtree (ver. 0.8.2) and then delete them using Python. I understand that rtree works with rectangles (or boxes in 3D), but I guess points are a subset of rectangles.

I'm getting strange behavior while deleting items from the rtree. The script below shows the behavior:

from rtree import index as rtindex

def pt2rect(pt):
    return pt[0], pt[1], pt[0], pt[1]

pts = [(0.0, 0.0), (1.0, 1.0), (0.0, 1.0)]
rt = rtindex.Index()

# Add the points
[rt.add(0, pt2rect(pt)) for pt in pts]
print [r.bbox for r in list(rt.nearest((0, 0), 10, True))]

# Remove the same points
for pt in pts:
    rt.delete(0, pt2rect(pt))
    print pt2rect(pt), [r.bbox for r in list(rt.nearest((0, 0), 10, True))]

And the output is:

True
[[0.0, 0.0, 0.0, 0.0], [0.0, 1.0, 0.0, 1.0], [1.0, 1.0, 1.0, 1.0]]  # Whole index
(0.0, 0.0, 0.0, 0.0) [[0.0, 1.0, 0.0, 1.0], [1.0, 1.0, 1.0, 1.0]]  # <-- Ok
(1.0, 1.0, 1.0, 1.0) [[1.0, 1.0, 1.0, 1.0]]  # <-- Wrong point deleted!
(0.0, 1.0, 0.0, 1.0) [[1.0, 1.0, 1.0, 1.0]]  # <-- Ok, as it's not found.

From the docs (http://toblerity.org/rtree/class.html):

delete(id, coordinates) Deletes items from the index with the given 'id' within the specified coordinates.

Parameters:

id – long integer A long integer that is the identifier for this index entry. IDs need not be unique to be inserted into the index, and it is up to the user to ensure they are unique if this is a requirement.

coordinates – sequence or array Dimension * 2 coordinate pairs, representing the min and max coordinates in each dimension of the item to be deleted from the index. Their ordering will depend on the index’s interleaved data member. These are not the coordinates of a space containing the item, but those of the item itself. Together with the id parameter, they determine which item will be deleted. This may be an object that satisfies the numpy array protocol.

But as can be seen, a point with the given id but NOT within the given coordinates is being deleted in line 4 of the output.

The documentation also clearly specifies that ids are not required to be unique in either insertion or deletion. (The repeated 0 == id in the example is on purpose as I my application I need the duplicate ids. Multiple points for the same "thing".)

Also confirmed that points can be indexed by using xmin == xmax and ymin == ymax.

Am I using the library wrong, or is libspatialindex (the binary library behind Python rtree) behaving different to what the rtree docs state?


回答1:


Don't assign duplicate id to different objects.

It's deleting the first object with matching id it finds in the leaf (check the libspatialindex source code, Leaf::deleteData if you don't trust me). Coordinates are only used to find the correct leaf to delete from. All your ids are 0, so it always deletes the first element from the leaf. The later deletions fail, because the bounding box of your tree is now [0.0,1.0,1.0,1.0], and the points with y=0.0 cannot be in this leaf.

Try

[rt.add(id, [x[0], x[1], x[0], x[1]]) for id, x in enumerate(pts)]

and

for id, x in enumerate(pts):
    rt.delete(id, [x[0], x[1], x[0], x[1]])
    print [x.bbox for x in list(rt.nearest([0, 0], 10, True))]

Note that the documentation of the rtree module is misleading.

Deletes items from the index with the given 'id' within the specified coordinates.

Parameters:

  • id – long integer A long integer that is the identifier for this index entry. IDs need not be unique to be inserted into the index, and it is up to the user to ensure they are unique if this is a requirement.
  • coordinates – sequence or array Dimension * 2 coordinate pairs, representing the min and max coordinates in each dimension of the item to be deleted from the index. Their ordering will depend on the index’s interleaved data member. These are not the coordinates of a space containing the item, but those of the item itself. Together with the id parameter, they determine which item will be deleted. This may be an object that satisfies the numpy array protocol.

(Emphasis added.)

This does not say that the id is not required to be unique for deletion. It says you can insert multiple entries with the same id, but it doesn't say deletion will be predictable. ;-) Also "determine" is vague. The coordinates are used to find the correct leaf, then the first matching id in this leaf is deleted. (Judging from the source code of libspatialindex) Thus, the id must be unique for deletion to work reliably.



来源:https://stackoverflow.com/questions/27660298/remove-2d-points-from-rtree-in-python

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!