Lucene Search Syntax

狂风中的少年 提交于 2019-12-20 05:38:12

问题


I need help figuring out which query types to use in given situations.

I think i'm right in saying that if i stored the word "FORD" in a lucene Field and i wanted to find an exact match i would use a TermQuery?

But which query type should i use if I was looking for the word "FORD" where the contents of the field where stored as :-

"FORD|HONDA|SUZUKI"

What if i was to search the contents of an entire page, looking for a phrase? such as "please help me"?


回答1:


If you want to search FORD in FORD|HONDA|SUZUKI, either index with Field.Index.ANALYZED, or store it as below to use TermQuery

var analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30);
var fs = FSDirectory.Open("test.index");

//Index a Test Document
IndexWriter wr = new IndexWriter(fs, analyzer, true, IndexWriter.MaxFieldLength.LIMITED);
var doc = new Document();

doc.Add(new Field("Model", "FORD", Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.Add(new Field("Model", "HONDA", Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.Add(new Field("Model", "SUZUKI", Field.Store.YES, Field.Index.NOT_ANALYZED));

doc.Add(new Field("Text", @"What if i was to search the contents of an entire page, looking for a phrase? such as ""please help me""?", 
                    Field.Store.YES, Field.Index.ANALYZED));

wr.AddDocument(doc);
wr.Commit();

var reader = wr.GetReader();
var searcher = new IndexSearcher(reader);

//Use TermQuery for "NOT_ANALYZED" fields
var result = searcher.Search(new TermQuery(new Term("Model", "FORD")), 100);
foreach (var item in result.ScoreDocs)
{
    Console.WriteLine("1)" + reader.Document(item.Doc).GetField("Text").StringValue);
}

//Use QueryParser for "ANALYZED" fields
var qp = new QueryParser(Lucene.Net.Util.Version.LUCENE_30, "Text", analyzer);
result = searcher.Search(qp.Parse(@"""HELP ME"""), 100);
foreach (var item in result.ScoreDocs)
{
    Console.WriteLine("2)" + reader.Document(item.Doc).GetField("Text").StringValue);
}

TermQuery means you want to search the term as it is stored in index which depends on how you indexed that field(NOT_ANALYZED, ANALYZED+WhichAnalyzer). Most common use of it is with NOT_ANALYZED fields.

You can use TermQuery with ANALYZED fields too, but then you should know how the analyzer tokenizes your input string. Below is a sample to see what how analyzers tokenize your input

var text = @"What if i was to search the contents of an entire page, looking for a phrase? such as ""please help me""?";
var analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30 );
//var analyzer = new WhitespaceAnalyzer();
//var analyzer = new KeywordAnalyzer();
//var analyzer = new SimpleAnalyzer();

var ts = analyzer.TokenStream("", new StringReader(text));
var termAttr = ts.GetAttribute<ITermAttribute>();

while (ts.IncrementToken())
{
    Console.Write("[" + termAttr.Term + "] " );    
}



回答2:


I would turn the problem sideways, so I put the multiple values for each field separately in the index -- this should make searching simpler. Looking at Field Having Multiple Values might be helpful.



来源:https://stackoverflow.com/questions/25465349/lucene-search-syntax

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!