lucene get matched terms in query

丶灬走出姿态 提交于 2019-11-27 19:32:34

问题


What is the best way to find out which terms in a query matched against a given document returned as a hit in lucene?

I have tried a weird method involving hit highlighting package in lucene contrib and also a method that searches for every word in the query against the top most document ("docId: xy AND description: each_word_in_query").

Do not get satisfactory results? Hit highlighting does not report some of the words that matched for a document other than the first one. I'm not sure if the second approach is the best alternative.


回答1:


The method explain in the Searcher is a nice way to see which part of a query was matched and how it affects the overall score.

Example taken from the book Lucene In Action 2nd Edition:

public class Explainer {

  public static void main(String[] args) throws Exception {

     if (args.length != 2) {
        System.err.println("Usage: Explainer <index dir> <query>");
        System.exit(1);
     }

     String indexDir = args[0];
     String queryExpression = args[1];
     Directory directory = FSDirectory.open(new File(indexDir));
     QueryParser parser = new QueryParser(Version.LUCENE_CURRENT,
                                     "contents", new SimpleAnalyzer());

     Query query = parser.parse(queryExpression);
     System.out.println("Query: " + queryExpression);
     IndexSearcher searcher = new IndexSearcher(directory);
     TopDocs topDocs = searcher.search(query, 10);
     for (int i = 0; i < topDocs.totalHits; i++) {
        ScoreDoc match = topDocs.scoreDocs[i];
        Explanation explanation = searcher.explain(query, match.doc);   
        System.out.println("----------");
        Document doc = searcher.doc(match.doc);
        System.out.println(doc.get("title"));
        System.out.println(explanation.toString());
     }
  }
}

This will explain the score of each document that matches the query.




回答2:


Not tried yet, but have a look at the implementation of org.apache.lucene.search.highlight.QueryTermExtractor.



来源:https://stackoverflow.com/questions/2851473/lucene-get-matched-terms-in-query

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!