lucene phrase query not working

后端未结

关注

 3  557

I am trying to write a simple program using Lucene 2.9.4 which searches for a phrase query but I am getting 0 hits

public class HelloLucene {

public static


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  刺人心        
                
              
                            
                2021-01-15 10:49
              
            
            
                                                                       
There are two issues with the code (and they have nothing to do with your version of Lucene):

1) the StandardAnalyzer does not index stopwords (like "in"), so the PhraseQuery will never be able to find the phrase "Lucene in"

2) as mentioned by Xodarap and Shashikant Kore, your call to create a document needs to include Index.ANALYZED, otherwise Lucene does not use the Analyzer on this section of the Document. There's probably a nifty way to do it with Index.NOT_ANALYZED, but I'm not familiar with it.

For an easy fix, change your addDoc method to:

public static void addDoc(IndexWriter w, String value)throws IOException{
    Document doc = new Document();
    doc.add(new Field("content", value, Field.Store.YES, Field.Index.ANALYZED));
    w.addDocument(doc);
}


and modify your creation of the PhraseQuery to:

    PhraseQuery pq = new PhraseQuery();
    pq.add(new Term("content", "computer"),0);
    pq.add(new Term("content", "science"),1);
    pq.setSlop(0);


This will give you the result below since both "computer" and "science" are not stopwords:

    Found 1 hits.
    1.The Art of Computer Science


If you want to find "Lucene in Action", you can increase the slop of this PhraseQuery (increasing the 'gap' between the two words):

    PhraseQuery pq = new PhraseQuery();
    pq.add(new Term("content", "lucene"),0);
    pq.add(new Term("content", "action"),1);
    pq.setSlop(1);


If you really want to search for the sentence "lucene in", you will need to select a different analyzer (like the SimpleAnalyzer). In Lucene 2.9, just replace your call to the StandardAnalyzer with:

    SimpleAnalyzer analyzer = new SimpleAnalyzer();


Or, if you're using version 3.1 or higher, you need to add the version information:

    SimpleAnalyzer analyzer = new SimpleAnalyzer(Version.LUCENE_35);


Here is a helpful post on a similar issue (this will help you get going with PhraseQuery):
Exact Phrase search using Lucene? -- see WhiteFang34's answer.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  萌比男神i        
                
              
                            
                2021-01-15 10:54
              
            
            
                                                                       
This is my solution with Lucene Version.LUCENE_35. It is also called Lucene 3.5.0 from http://lucene.apache.org/java/docs/releases.html. If you are using an IDE like Eclipse, you can add the .jar file to your build path, this is the direct link to the 3.5.0.jar file: http://repo1.maven.org/maven2/org/apache/lucene/lucene-core/3.5.0/lucene-core-3.5.0.jar. 

When a new version of Lucene comes out this solution will still be applicable ONLY if you continue using the 3.5.0.jar.

Now for the code:

import java.io.IOException;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.util.Version;

public class Index {
public static void main(String[] args) throws IOException, ParseException {
  // To store the Lucene index in RAM
    Directory directory = new RAMDirectory();
    // To store the Lucene index in your harddisk, you can use:
    //Directory directory = FSDirectory.open("/foo/bar/testindex");

    // Set the analyzer that you want to use for the task.
    Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_35);
    // Creating Lucene Index; note, the new version demands configurations.
    IndexWriterConfig config = new IndexWriterConfig(
            Version.LUCENE_35, analyzer);  
    IndexWriter writer = new IndexWriter(directory, config);
    // Note: There are other ways of initializing the IndexWriter.
    // (see http://lucene.apache.org/java/3_5_0/api/all/org/apache/lucene/index/IndexWriter.html)

    // The new version of Documents.add in Lucene requires a Field argument,
    //  and there are a few ways of calling the Field constructor.
    //  (see http://lucene.apache.org/java/3_5_0/api/core/org/apache/lucene/document/Field.html)
    // Here I just use one of the Field constructor that takes a String parameter.
    List<Document> docs = new ArrayList<Document>();
    Document doc1 = new Document();
    doc1.add(new Field("content", "Lucene in Action", 
        Field.Store.YES, Field.Index.ANALYZED));
    Document doc2 = new Document();
    doc2.add(new Field("content", "Lucene for Dummies", 
        Field.Store.YES, Field.Index.ANALYZED));
    Document doc3 = new Document();
    doc3.add(new Field("content", "Managing Gigabytes", 
        Field.Store.YES, Field.Index.ANALYZED));
    Document doc4 = new Document();
    doc4.add(new Field("content", "The Art of Lucene", 
        Field.Store.YES, Field.Index.ANALYZED));

    docs.add(doc1); docs.add(doc2); docs.add(doc3); docs.add(doc4);

    writer.addDocuments(docs);
    writer.close();

    // To enable query/search, we need to initialize 
    //  the IndexReader and IndexSearcher.
    // Note: The IndexSearcher in Lucene 3.5.0 takes an IndexReader parameter
    //  instead of a Directory parameter.
    IndexReader iRead = IndexReader.open(directory);
    IndexSearcher iSearch = new IndexSearcher(iRead);

    // Parse a simple query that searches for the word "lucene".
    // Note: you need to specify the fieldname for the query 
    // (in our case it is "content").
    QueryParser parser = new QueryParser(Version.LUCENE_35, "content", analyzer);
    Query query = parser.parse("lucene in");

    // Search the Index with the Query, with max 1000 results
    ScoreDoc[] hits = iSearch.search(query, 1000).scoreDocs;

    // Iterate through the search results
    for (int i=0; i<hits.length;i++) {
        // From the indexSearch, we retrieve the search result individually
        Document hitDoc = iSearch.doc(hits[i].doc);
        // Specify the Field type of the retrieved document that you want to print.
        // In our case we only have 1 Field i.e. "content".
        System.out.println(hitDoc.get("content"));
    }
    iSearch.close(); iRead.close(); directory.close();
}   
}

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  别跟我提以往        
                
              
                            
                2021-01-15 11:13
              
            
            
                                                                       
The field needs to be analyzed as well as term vectors need to be enabled. 

doc.add(new Field("content", value, Field.Store.YES, Field.Index.ANALYZED,  Field.TermVector.YES));


You can disable storing if you do not plan to retrieve that field from the
 index.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复