Using Shingles and Stop words with Elasticsearch and Lucene 4.4

坚强是说给别人听的谎言 提交于 2019-12-04 05:23:04

Probably not the most optimal solution, but the most blunt would be to add another filter to your analyzer to kill "_" filler tokens. In the example below I called it "kill_fillers":

   "shingleAnalyzer": {
      "tokenizer": "standard",
      "filter": [

Add "kill_fillers" filter to your list of filters:

  "kill_fillers": {
    "type": "pattern_replace",
    "pattern": ".*_.*",
    "replace": "",

im not sure if this helps, but in elastic definition of shingles, you can use the parameter filler_token which is by default _. set it to, for example, an empty string:

$indexParams['body']['settings']['analysis']['filter']['shingle-filter']['filler_token'] = "";
