I have large database of annotations of images stored in an elasticsearch database. I want to use this database for keyword extraction. Input is text (typically a newspaper article). My basic idea for an algorithm is to go through each term from the article and use elasticsearch to discover how frequent the term is in the image annotations. Then output terms from articles which are not frequent (in order to prefer names of people or places over common English words).
I don't need something very sophisticated, these keywords are used just as suggestion for user input, but I want something faster then asking N search queries (where N is number of terms in text) to elasticsearch which can be slow on large texts. Is there some robust and fast technique for keyword extraction in elasticsearch?
You can use elastic search term aggregations for this. They can return bucketed keywords with document counts which indicate their relative frequency. Here is an example query in YML.
query:
match:
annotation:
query: text of your article
aggregations:
term_frequencies:
terms:
field: annotation
来源:https://stackoverflow.com/questions/22171211/fast-keyword-extraction-in-elasticsearch