I am trying to write a simple program using Lucene 2.9.4 which searches for a phrase query but I am getting 0 hits
public class HelloLucene {
public static
There are two issues with the code (and they have nothing to do with your version of Lucene):
1) the StandardAnalyzer does not index stopwords (like "in"), so the PhraseQuery will never be able to find the phrase "Lucene in"
2) as mentioned by Xodarap and Shashikant Kore, your call to create a document needs to include Index.ANALYZED, otherwise Lucene does not use the Analyzer on this section of the Document. There's probably a nifty way to do it with Index.NOT_ANALYZED, but I'm not familiar with it.
For an easy fix, change your addDoc method to:
public static void addDoc(IndexWriter w, String value)throws IOException{
Document doc = new Document();
doc.add(new Field("content", value, Field.Store.YES, Field.Index.ANALYZED));
w.addDocument(doc);
}
and modify your creation of the PhraseQuery to:
PhraseQuery pq = new PhraseQuery();
pq.add(new Term("content", "computer"),0);
pq.add(new Term("content", "science"),1);
pq.setSlop(0);
This will give you the result below since both "computer" and "science" are not stopwords:
Found 1 hits.
1.The Art of Computer Science
If you want to find "Lucene in Action", you can increase the slop of this PhraseQuery (increasing the 'gap' between the two words):
PhraseQuery pq = new PhraseQuery();
pq.add(new Term("content", "lucene"),0);
pq.add(new Term("content", "action"),1);
pq.setSlop(1);
If you really want to search for the sentence "lucene in", you will need to select a different analyzer (like the SimpleAnalyzer). In Lucene 2.9, just replace your call to the StandardAnalyzer with:
SimpleAnalyzer analyzer = new SimpleAnalyzer();
Or, if you're using version 3.1 or higher, you need to add the version information:
SimpleAnalyzer analyzer = new SimpleAnalyzer(Version.LUCENE_35);
Here is a helpful post on a similar issue (this will help you get going with PhraseQuery): Exact Phrase search using Lucene? -- see WhiteFang34's answer.
This is my solution with Lucene Version.LUCENE_35. It is also called Lucene 3.5.0 from http://lucene.apache.org/java/docs/releases.html. If you are using an IDE like Eclipse, you can add the .jar file to your build path, this is the direct link to the 3.5.0.jar file: http://repo1.maven.org/maven2/org/apache/lucene/lucene-core/3.5.0/lucene-core-3.5.0.jar.
When a new version of Lucene comes out this solution will still be applicable ONLY if you continue using the 3.5.0.jar.
Now for the code:
import java.io.IOException;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.util.Version;
public class Index {
public static void main(String[] args) throws IOException, ParseException {
// To store the Lucene index in RAM
Directory directory = new RAMDirectory();
// To store the Lucene index in your harddisk, you can use:
//Directory directory = FSDirectory.open("/foo/bar/testindex");
// Set the analyzer that you want to use for the task.
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_35);
// Creating Lucene Index; note, the new version demands configurations.
IndexWriterConfig config = new IndexWriterConfig(
Version.LUCENE_35, analyzer);
IndexWriter writer = new IndexWriter(directory, config);
// Note: There are other ways of initializing the IndexWriter.
// (see http://lucene.apache.org/java/3_5_0/api/all/org/apache/lucene/index/IndexWriter.html)
// The new version of Documents.add in Lucene requires a Field argument,
// and there are a few ways of calling the Field constructor.
// (see http://lucene.apache.org/java/3_5_0/api/core/org/apache/lucene/document/Field.html)
// Here I just use one of the Field constructor that takes a String parameter.
List<Document> docs = new ArrayList<Document>();
Document doc1 = new Document();
doc1.add(new Field("content", "Lucene in Action",
Field.Store.YES, Field.Index.ANALYZED));
Document doc2 = new Document();
doc2.add(new Field("content", "Lucene for Dummies",
Field.Store.YES, Field.Index.ANALYZED));
Document doc3 = new Document();
doc3.add(new Field("content", "Managing Gigabytes",
Field.Store.YES, Field.Index.ANALYZED));
Document doc4 = new Document();
doc4.add(new Field("content", "The Art of Lucene",
Field.Store.YES, Field.Index.ANALYZED));
docs.add(doc1); docs.add(doc2); docs.add(doc3); docs.add(doc4);
writer.addDocuments(docs);
writer.close();
// To enable query/search, we need to initialize
// the IndexReader and IndexSearcher.
// Note: The IndexSearcher in Lucene 3.5.0 takes an IndexReader parameter
// instead of a Directory parameter.
IndexReader iRead = IndexReader.open(directory);
IndexSearcher iSearch = new IndexSearcher(iRead);
// Parse a simple query that searches for the word "lucene".
// Note: you need to specify the fieldname for the query
// (in our case it is "content").
QueryParser parser = new QueryParser(Version.LUCENE_35, "content", analyzer);
Query query = parser.parse("lucene in");
// Search the Index with the Query, with max 1000 results
ScoreDoc[] hits = iSearch.search(query, 1000).scoreDocs;
// Iterate through the search results
for (int i=0; i<hits.length;i++) {
// From the indexSearch, we retrieve the search result individually
Document hitDoc = iSearch.doc(hits[i].doc);
// Specify the Field type of the retrieved document that you want to print.
// In our case we only have 1 Field i.e. "content".
System.out.println(hitDoc.get("content"));
}
iSearch.close(); iRead.close(); directory.close();
}
}
The field needs to be analyzed as well as term vectors need to be enabled.
doc.add(new Field("content", value, Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.YES));
You can disable storing if you do not plan to retrieve that field from the index.