问题
I'm new to Lucene and trying to parse a raw string into a Query
using the QueryParser
.
I was wondering, why is the QueryParser.Parse()
method needs an Analyzer parameter at all?
If analyzing is something that has to do with querying, then an Analyzer
should be specified when dealing with regular Query
objects as well (TermQuery
, BooleanQuery
etc), and if not, why is QueryParser
requires it?
回答1:
When indexing, Lucene divides the text into atomic units (tokens). During this phase many things can happen (e.g. lowercasing, stemming, removal of stopwords, etc.). The end result is a term.
Then, when you query, Lucene applies exactly the same algorithm to the query so it can match term with a term.
Q: Why doesn't TermQuery
require analyzer?
A: QueryParser
object parses query string and produces TermQuery
(can also produce other types of queries, e.g. PhraseQuery
). TermQuery
already contains terms in the same shape as they are in the index. If you (as a programmer) are absolutely sure what you doing, you can create a TermQuery
yourself -- but this assumes you know the exact sequence of query parsing and you know how terms look like in the index.
Q: Why doesn't BooleanQuery
require analyzer?
A: BooleanQuery
just joins other queries using operators (AND/OR/MUST/SHOULD, etc.). It's not really useful itself without any other queries.
This is a very simplified answer. I highly recommend reading Introduction to Information Retrieval book; it contains the theory based on which Lucene (and other similar frameworks) is written. This book is available online for free.
来源:https://stackoverflow.com/questions/15226337/why-does-lucene-queryparser-needs-an-analyzer