问题
I want to be able autocomplete names.
For example, if we have the name John Smith
, I want to be able to search for Jo
and Sm
and John Sm
to get the document back.
In addition, I do not want jo sm
matching the document.
I currently have this analyzer:
return array(
'settings' => array(
'index' => array(
'analysis' => array(
'analyzer' => array(
'autocomplete' => array(
'tokenizer' => 'autocompleteEngram',
'filter' => array('lowercase', 'whitespace')
)
),
'tokenizer' => array(
'autocompleteEngram' => array(
'type' => 'edgeNGram',
'min_gram' => 1,
'max_gram' => 50
)
)
)
)
)
);
The problem with this is that first we split the text up and then tokenize using edgengrams.
This results in this:
j
jo
joh
john
s
sm
smi
smit
smith
This means, if I search for john smith
or john sm
, nothing would be returned.
So, I need to be generate tokens that look like this:
j
jo
joh
john
s
sm
smi
smit
smith
john s
john sm
john smi
john smit
john smith
.
How can I set up my analyzer so that I generates those extra tokens?
回答1:
I ended up not using edgengrams.
I created an analyzer with the standard
tokenizer, and standard
and lowercase
filters. This is virtually identical to the standard
analyser, but does not have any stopwords filter (we are searching for names after all, and there might be someone called The
or An
etc).
I then set the above analyzer as the index_analyzer
and simple
as the search_analyzer
. Using this setup with a match_phrase_prefix
query worked really well.
This is the custom analyser I used (called autocomplete and expressed in PHP):
'autocomplete' => array(
'tokenizer' => 'standard',
'filter' => array('standard', 'lowercase')
),
来源:https://stackoverflow.com/questions/17017216/analyzer-to-autocomplete-names