问题
I am trying to build the KD tree(independent) for image features. I have extracted the image features,the feature contains suppose 1000 float values.
Using map-reduce to distribute the images among the nodes of the cluster according to classification(eg, cat,dog,guns)ie. each node will contain the bunch of the similar images & then build KD tree of the images on each node. I am confused about how the tree can be built.
So how can I build the KD tree using map-reduce? Each node will contain the tree,right? What could be the logic to distribute the images? While building the KD-tree, on what basis should I add image-feature vectors in tree(ie left or right child)?
Any help is appreciated.Thanks in advance.
回答1:
I don't think that a k-d-tree is the right thing for your data. Here's what Wikipedia says about it:
k-d trees are not suitable for efficiently finding the nearest neighbour in high dimensional spaces. As a general rule, if the dimensionality is k, the number of points in the data, N, should be N >> 2^k. Otherwise, when k-d trees are used with high-dimensional data, most of the points in the tree will be evaluated and the efficiency is no better than exhaustive search, and approximate nearest-neighbour methods should be used instead.
Your feature vectors have dimensionality 1000, which means that you should have around 10^300 images, which is quite unlikely.
I suggest that you look at Locality-sensitive hashing, which is one of the mentioned approximate nearest-neighbor searches for high-dimensional data.
Since Wikipedia is not always the best place to learn something complicated, I suggest you take a look at the respective lecture slides of the Data Mining course of ETH Zurich instead. It just so happens that I am taking this course in the current semester.
来源:https://stackoverflow.com/questions/10984168/building-a-k-d-tree-using-mapreduce