问题
How (if at all possible) can one insert any term-vector in an ElasticSearch index?
ES computes term-vectors, behind the scenes, in order to carry out it's text mining tasks, but it would be useful to be able to enter any list of (term, weight) pairs instead.
Why?
Well, for instance, though ES enables kNN (k-nearest-neighbors) for k=2, in the context of geographic proximity, it doesn't have any explicit k>2 functionality. If we were able to insert our own term-vectors, we could hack a k>2 functionality by harnessing ES's built in text-indexing methods.
Any indications on this issue?
回答1:
As far as I know, there's no way to do that by elasticsearch (I'm still looking for the fastest KNN real time search approach, elasticsearch is one of my choices).
Elasticsearch is based on inverted index, so each term in the term vector (which may comes from a sentence) will be indexed in a sorted list. When we're searching a query, the query will be analyzed into a term vector and elasticsearch (lucene actually) will search the indices for each term.
But KNN requires calculating the distance between two vectors even they don't share the same term, the traditional inverted index is not designed for this requirement.
As you have said, elasticsearch could implement the real time KNN search when k = 2 by geo query, but I don't think it could support k > 2.
By the way, if you have found any approach that could help implement real time KNN search that K may be a very large number ( 100000 ?) and on a huge data set (number of vectors), please tell me, thx :)
来源:https://stackoverflow.com/questions/30119265/user-defined-termvectors-in-elasticsearch