Difference between StandardTokenizerFactory and KeywordTokenizerFactory in Solr?

跟風遠走 提交于 2019-11-29 13:29:43
Jayendra

StandardTokenizerFactory :-
It tokenizes on whitespace, as well as strips characters

Documentation :-

Splits words at punctuation characters, removing punctuations. However, a dot that's not followed by whitespace is considered part of a token. Splits words at hyphens, unless there's a number in the token. In that case, the whole token is interpreted as a product number and is not split. Recognizes email addresses and Internet hostnames as one token.

Would use this for fields where you want to search on the field data.

e.g. -

http://example.com/I-am+example?Text=-Hello

would generate 7 tokens (separated by comma) -

http,example.com,I,am,example,Text,Hello

KeywordTokenizerFactory :-

Keyword Tokenizer does not split the input at all.
No processing in performed on the string, and the whole string is treated as a single entity.
This doesn't actually do any tokenization. It returns the original text as one term.

Mainly used for sorting or faceting requirements, where you want to match the exact facet when filtering on multiple words and sorting as sorting does not work on tokenized fields.

e.g.

http://example.com/I-am+example?Text=-Hello

would generate a single token -

http://example.com/I-am+example?Text=-Hello
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!