Using Shingles and Stop words with Elasticsearch and Lucene 4.4

坚强是说给别人听的谎言 提交于 2019-12-04 05:23:04

Probably not the most optimal solution, but the most blunt would be to add another filter to your analyzer to kill "_" filler tokens. In the example below I called it "kill_fillers":

   "shingleAnalyzer": {
      "tokenizer": "standard",
      "filter": [
        "standard",
        "lowercase",
        "custom_stop",
        "custom_shingle",
        "custom_stemmer",
        "kill_fillers"
       ],
       ...

Add "kill_fillers" filter to your list of filters:

"filters":{
...
  "kill_fillers": {
    "type": "pattern_replace",
    "pattern": ".*_.*",
    "replace": "",
  },
...
}

im not sure if this helps, but in elastic definition of shingles, you can use the parameter filler_token which is by default _. set it to, for example, an empty string:

$indexParams['body']['settings']['analysis']['filter']['shingle-filter']['filler_token'] = "";

https://www.elastic.co/guide/en/elasticsearch/reference/1.7/analysis-shingle-tokenfilter.html

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!