User defined termvectors in ElasticSearch

此生再无相见时 提交于 2019-12-22 17:58:36

问题


How (if at all possible) can one insert any term-vector in an ElasticSearch index?

ES computes term-vectors, behind the scenes, in order to carry out it's text mining tasks, but it would be useful to be able to enter any list of (term, weight) pairs instead.

Why?

Well, for instance, though ES enables kNN (k-nearest-neighbors) for k=2, in the context of geographic proximity, it doesn't have any explicit k>2 functionality. If we were able to insert our own term-vectors, we could hack a k>2 functionality by harnessing ES's built in text-indexing methods.

Any indications on this issue?


回答1:


As far as I know, there's no way to do that by elasticsearch (I'm still looking for the fastest KNN real time search approach, elasticsearch is one of my choices).

Elasticsearch is based on inverted index, so each term in the term vector (which may comes from a sentence) will be indexed in a sorted list. When we're searching a query, the query will be analyzed into a term vector and elasticsearch (lucene actually) will search the indices for each term.

But KNN requires calculating the distance between two vectors even they don't share the same term, the traditional inverted index is not designed for this requirement.

As you have said, elasticsearch could implement the real time KNN search when k = 2 by geo query, but I don't think it could support k > 2.

By the way, if you have found any approach that could help implement real time KNN search that K may be a very large number ( 100000 ?) and on a huge data set (number of vectors), please tell me, thx :)



来源:https://stackoverflow.com/questions/30119265/user-defined-termvectors-in-elasticsearch

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!