问题
I'd like to use the NeighborSearch class in mlpack to perform KNN classification on some vectors representing documents.
I'd like to use Cosine Distance, but I'm having trouble. I think the way to do this is to use the inner-product metric "IPMetric" and specify the CosineDistance kernel... This is what I have:
NeighborSearch<NearestNeighborSort, IPMetric<CosineDistance>> nn(X_train);
But I get the following compile errors:
/usr/include/mlpack/core/tree/hrectbound_impl.hpp:211:15: error: ‘Power’ is not a member of ‘mlpack::metric::IPMetric<mlpack::kernel::CosineDistance>’
sum += pow((lower + fabs(lower)) + (higher + fabs(higher)),
^
/usr/include/mlpack/core/tree/hrectbound_impl.hpp:220:3: error: ‘TakeRoot’ is not a member of ‘mlpack::metric::IPMetric<mlpack::kernel::CosineDistance>’
if (MetricType::TakeRoot)
^
I suspect that the problem may be that the default tree type, KDTree, does not support this distance metric? If that's the issue, is there a tree type that does work for CosineDistance?
Finally, is it possible to use a brute-force search? I can't seem to find a way to use no tree at all...
Thanks!
回答1:
Unfortunately, like you suspected, arbitrary metric types don't work with the KDTree---this is because the kd-tree requires a distance that can be decomposed into different dimensions. But that is not possible with IPMetric
. Instead, why not try using the cover tree? The build time of the tree may be somewhat longer but it should give comparable performance:
NeighborSearch<NearestNeighborSort, IPMetric<CosineDistance>, arma::mat,
tree::StandardCoverTree> nn(X_train);
If you want to do brute-force search, specify the search mode in the constructor:
NeighborSearch<NearestNeighborSort, IPMetric<CosineDistance>, arma::mat,
tree::StandardCoverTree> nn(X_train, NAIVE_MODE);
I hope this is helpful; let me know if I can clarify anything.
来源:https://stackoverflow.com/questions/42097957/mlpack-nearest-neighbor-with-cosine-distance