In Lucene, why do my boosted and unboosted documents get the same score?

こ雲淡風輕ζ 提交于 2019-12-18 08:10:42

问题


At index time I am boosting certain document in this way:

if (myCondition)  
{
   document.SetBoost(1.2f);
}

But at search time documents with all the exact same qualities but some passing and some failing myCondition all end up having the same score.

And here is the search code:

BooleanQuery booleanQuery = new BooleanQuery();
booleanQuery.Add(new TermQuery(new Term(FieldNames.HAS_PHOTO, "y")), BooleanClause.Occur.MUST);
booleanQuery.Add(new TermQuery(new Term(FieldNames.AUTHOR_TYPE, AuthorTypes.BLOGGER)), BooleanClause.Occur.MUST_NOT);
indexSearcher.Search(booleanQuery, 10);

Can you tell me what I need to do to get the documents that were boosted to get a higher score?

Many Thanks!


回答1:


Lucene encodes boosts on a single byte (although a float is generally encoded on four bytes) using the SmallFloat#floatToByte315 method. As a consequence, there can be a big loss in precision when converting back the byte to a float.

In your case SmallFloat.byte315ToFloat(SmallFloat.floatToByte315(1.2f)) returns 1f because 1f and 1.2f are too close to each other. Try using a bigger boost so that your documents get different scores. (For exemple 1.25, SmallFloat.byte315ToFloat(SmallFloat.floatToByte315(1.25f)) gives 1.25f.)




回答2:


Here is the requested test program that was too long to post in a comment.

class Program
{
    static void Main(string[] args)
    {
        RAMDirectory dir = new RAMDirectory();
        IndexWriter writer = new IndexWriter(dir, new WhitespaceAnalyzer());

        const string FIELD = "name";

        for (int i = 0; i < 10; i++)
        {
            StringBuilder notes = new StringBuilder();
            notes.AppendLine("This is a note 123 - " + i);

            string text = notes.ToString();

            Document doc = new Document();
            var field = new Field(FIELD, text, Field.Store.YES, Field.Index.NOT_ANALYZED);

            if (i % 2 == 0)
            {
                field.SetBoost(1.5f);
                doc.SetBoost(1.5f);
            }
            else 
            {
                field.SetBoost(0.1f);
                doc.SetBoost(0.1f);
            }

            doc.Add(field);
            writer.AddDocument(doc);
        }

        writer.Commit();

        //string TERM = QueryParser.Escape("*+*");
        string TERM = "T";

        IndexSearcher searcher = new IndexSearcher(dir);
        Query query = new PrefixQuery(new Term(FIELD, TERM));
        var hits = searcher.Search(query);            
        int count = hits.Length();

        Console.WriteLine("Hits - {0}", count);

        for (int i = 0; i < count; i++)
        {
            var doc = hits.Doc(i);
            Console.WriteLine(doc.ToString());

            var explain = searcher.Explain(query, i);
            Console.WriteLine(explain.ToString());
        }
    }
}


来源:https://stackoverflow.com/questions/7899031/in-lucene-why-do-my-boosted-and-unboosted-documents-get-the-same-score

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!