Analyzer to autocomplete names

独自空忆成欢 提交于 2019-12-08 06:32:47

问题


I want to be able autocomplete names.

For example, if we have the name John Smith, I want to be able to search for Jo and Sm and John Sm to get the document back.

In addition, I do not want jo sm matching the document.

I currently have this analyzer:

return array(
    'settings' => array(
        'index' => array(
            'analysis' => array(
                'analyzer' => array(
                    'autocomplete' => array(
                        'tokenizer' => 'autocompleteEngram',
                        'filter' => array('lowercase', 'whitespace')
                    )
                ),

                'tokenizer' => array(
                    'autocompleteEngram' => array(
                        'type' => 'edgeNGram',
                        'min_gram' => 1,
                        'max_gram' => 50
                    )
                )
            )   
        )
    )
);

The problem with this is that first we split the text up and then tokenize using edgengrams.

This results in this: j jo joh john s sm smi smit smith

This means, if I search for john smith or john sm, nothing would be returned.

So, I need to be generate tokens that look like this: j jo joh john s sm smi smit smith john s john sm john smi john smit john smith.

How can I set up my analyzer so that I generates those extra tokens?


回答1:


I ended up not using edgengrams.

I created an analyzer with the standard tokenizer, and standard and lowercase filters. This is virtually identical to the standard analyser, but does not have any stopwords filter (we are searching for names after all, and there might be someone called The or An etc).

I then set the above analyzer as the index_analyzer and simple as the search_analyzer. Using this setup with a match_phrase_prefix query worked really well.

This is the custom analyser I used (called autocomplete and expressed in PHP):

'autocomplete' => array(
                        'tokenizer' => 'standard',
                        'filter' => array('standard', 'lowercase')
                ),


来源:https://stackoverflow.com/questions/17017216/analyzer-to-autocomplete-names

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!