Elasticsearch analyzer to remove quoted sentences

风格不统一 提交于 2020-04-11 12:36:26

问题


I'm trying to create an analyzer that would remove (or replace by white/empty space) a quoted sentence within a document.

Such as: this is my \"test document\"

I'd like, for example, the term vector to be: [this, is, my]


回答1:


Daniel Answer is correct, but as corresponding regex and replacement are missing, I am providing it, which includes the test of your text.

Index setting as below which uses pattern replace char.

{
    "settings": {
        "analysis": {
            "analyzer": {
                "my_analyzer": {
                    "tokenizer": "standard",
                    "char_filter": [
                        "my_char_filter"
                    ],
                    "filter": [
                        "lowercase"
                    ]
                }
            },
            "char_filter": {
                "my_char_filter": {
                    "type": "pattern_replace",
                    "pattern": "\"(.*?)\"",
                    "replacement": ""
                }
            }
        }
    }
}

After that using analyze API it generates below tokens:

POST _analyze

{
    "text": "this is my \"test document\"",
    "analyzer" : "my_analyzer"
}

Output of above API:

{
    "tokens": [
        {
            "token": "this",
            "start_offset": 0,
            "end_offset": 4,
            "type": "<ALPHANUM>",
            "position": 0
        },
        {
            "token": "is",
            "start_offset": 5,
            "end_offset": 7,
            "type": "<ALPHANUM>",
            "position": 1
        },
        {
            "token": "my",
            "start_offset": 8,
            "end_offset": 10,
            "type": "<ALPHANUM>",
            "position": 2
        }
    ]
}



回答2:


You could configure your own analyzer with a Pattern Replace Character Filter for this field with that replaces everything in between the escaped double quotes with nothing.



来源:https://stackoverflow.com/questions/60479170/elasticsearch-analyzer-to-remove-quoted-sentences

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!