How to find most used phrases in elasticsearch?

我是研究僧i 提交于 2019-12-06 00:56:04

问题


I know that you can find most used terms in an index with using facets.

For example on following inputs:

"A B C" 
"AA BB CC"
"A AA B BB"
"AA B"

term facet returns this:

B:3
AA:3
A:2
BB:2
CC:1
C:1

But I'm wondering that is it possible to list followings:

AA B:2
A B:1
BB CC:1

....etc...

Is there such a feature in ElasticSearch?


回答1:


As mentioned in ramseykhalaf's comment, a shingle filter would produce tokens of length "n" words.

"settings" : { 
   "analysis" : {
       "filter" : {
          "shingle":{
              "type":"shingle",
              "max_shingle_size":5,
              "min_shingle_size":2,
              "output_unigrams":"true"
           },
           "filter_stop":{
              "type":"stop",
              "enable_position_increments":"false"
           }
       },
       "analyzer" : {
           "shingle_analyzer" : {
               "type" : "custom",
               "tokenizer" : "whitespace",
               "filter" : ["standard," "lowercase", "shingle", "filter_stop"]
           }
       }
   }
},
"mappings" : {
   "type" : {
       "properties" : {
           "letters" : {
               "type" : "string",
               "analyzer" : "shingle_analyzer"
           }
       }
   }
}

See this blog post for full details.




回答2:


I'm not sure if elasticsearch will let you do this the way you want natively. But you might be interested in checking out Carrot2 - http://project.carrot2.org/index.html to accomplished what you want (and probably more.)



来源:https://stackoverflow.com/questions/18252549/how-to-find-most-used-phrases-in-elasticsearch

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!