问题
I'm trying to store 2D points in an rtree (ver. 0.8.2) and then delete them using Python. I understand that rtree works with rectangles (or boxes in 3D), but I guess points are a subset of rectangles.
I'm getting strange behavior while deleting items from the rtree. The script below shows the behavior:
from rtree import index as rtindex
def pt2rect(pt):
return pt[0], pt[1], pt[0], pt[1]
pts = [(0.0, 0.0), (1.0, 1.0), (0.0, 1.0)]
rt = rtindex.Index()
# Add the points
[rt.add(0, pt2rect(pt)) for pt in pts]
print [r.bbox for r in list(rt.nearest((0, 0), 10, True))]
# Remove the same points
for pt in pts:
rt.delete(0, pt2rect(pt))
print pt2rect(pt), [r.bbox for r in list(rt.nearest((0, 0), 10, True))]
And the output is:
True
[[0.0, 0.0, 0.0, 0.0], [0.0, 1.0, 0.0, 1.0], [1.0, 1.0, 1.0, 1.0]] # Whole index
(0.0, 0.0, 0.0, 0.0) [[0.0, 1.0, 0.0, 1.0], [1.0, 1.0, 1.0, 1.0]] # <-- Ok
(1.0, 1.0, 1.0, 1.0) [[1.0, 1.0, 1.0, 1.0]] # <-- Wrong point deleted!
(0.0, 1.0, 0.0, 1.0) [[1.0, 1.0, 1.0, 1.0]] # <-- Ok, as it's not found.
From the docs (http://toblerity.org/rtree/class.html):
delete(id, coordinates) Deletes items from the index with the given 'id' within the specified coordinates.
Parameters:
id – long integer A long integer that is the identifier for this index entry. IDs need not be unique to be inserted into the index, and it is up to the user to ensure they are unique if this is a requirement.
coordinates – sequence or array Dimension * 2 coordinate pairs, representing the min and max coordinates in each dimension of the item to be deleted from the index. Their ordering will depend on the index’s interleaved data member. These are not the coordinates of a space containing the item, but those of the item itself. Together with the id parameter, they determine which item will be deleted. This may be an object that satisfies the numpy array protocol.
But as can be seen, a point with the given id
but NOT within the given coordinates is being deleted in line 4 of the output.
The documentation also clearly specifies that id
s are not required to be unique in either insertion or deletion. (The repeated 0 == id
in the example is on purpose as I my application I need the duplicate id
s. Multiple points for the same "thing".)
Also confirmed that points can be indexed by using xmin == xmax
and ymin == ymax
.
Am I using the library wrong, or is libspatialindex (the binary library behind Python rtree) behaving different to what the rtree docs state?
回答1:
Don't assign duplicate id
to different objects.
It's deleting the first object with matching id
it finds in the leaf (check the libspatialindex source code, Leaf::deleteData
if you don't trust me). Coordinates are only used to find the correct leaf to delete from. All your id
s are 0
, so it always deletes the first element from the leaf. The later deletions fail, because the bounding box of your tree is now [0.0,1.0,1.0,1.0]
, and the points with y=0.0 cannot be in this leaf.
Try
[rt.add(id, [x[0], x[1], x[0], x[1]]) for id, x in enumerate(pts)]
and
for id, x in enumerate(pts):
rt.delete(id, [x[0], x[1], x[0], x[1]])
print [x.bbox for x in list(rt.nearest([0, 0], 10, True))]
Note that the documentation of the rtree module is misleading.
Deletes items from the index with the given 'id' within the specified coordinates.
Parameters:
- id – long integer A long integer that is the identifier for this index entry. IDs need not be unique to be inserted into the index, and it is up to the user to ensure they are unique if this is a requirement.
- coordinates – sequence or array Dimension * 2 coordinate pairs, representing the min and max coordinates in each dimension of the item to be deleted from the index. Their ordering will depend on the index’s interleaved data member. These are not the coordinates of a space containing the item, but those of the item itself. Together with the id parameter, they determine which item will be deleted. This may be an object that satisfies the numpy array protocol.
(Emphasis added.)
This does not say that the id
is not required to be unique for deletion. It says you can insert multiple entries with the same id
, but it doesn't say deletion will be predictable. ;-) Also "determine" is vague. The coordinates are used to find the correct leaf, then the first matching id
in this leaf is deleted. (Judging from the source code of libspatialindex) Thus, the id
must be unique for deletion to work reliably.
来源:https://stackoverflow.com/questions/27660298/remove-2d-points-from-rtree-in-python