问题
I'm beginner of lucene. Here's my source:
ft = new FieldType(StringField.TYPE_STORED);
ft.setTokenized(false);
ft.setStored(true);
ftNA = new FieldType(StringField.TYPE_STORED);
ftNA.setTokenized(true);
ftNA.setStored(true);
Why tokenized in lucene? For example: the String value of "my name is lee"
- case tokenized, "my" "name" "is" "lee"
- case not tokenized, "my name is lee"
I'dont understand why indexing by tokenized. What is the difference between tokenized and not tokenized?
回答1:
Lucene works by finding tokens in documents which satisfy constraints expressed by a query.
If you search for lee
for instance, the query will find all documents that contain the token lee
. If the field isn't tokenized, you'll only be able to find my name is lee
, but not just lee
for instance.
Now suppose you search for "is lee"
. This is a PhraseQuery
, which means it'll match the token is
followed by the token lee
.
Tokenization is needed because Lucene works with an inverted index, ie it maps tokens to the documents that contain them.
来源:https://stackoverflow.com/questions/29457148/why-tokenize-texts-in-lucene