Using a Combination of Wildcards and Stemming

前端 未结 4 1863
闹比i
闹比i 2020-12-30 09:38

I\'m using a snowball analyzer to stem the titles of multiple documents. Everything works well, but their are some quirks.

Example:

A search for \"valv\", \

4条回答
  •  说谎
    说谎 (楼主)
    2020-12-30 10:26

    I don't think that there is an easy(and correct) way to do this.

    My solution would be writing a custom query parser that finds the longest string common to the terms in the index and to your search criteria.

    class MyQueryParser : Lucene.Net.QueryParsers.QueryParser
    {
        IndexReader _reader;
        Analyzer _analyzer;
    
        public MyQueryParser(string field, Analyzer analyzer,IndexReader indexReader) : base(field, analyzer)
        {
            _analyzer = analyzer;
            _reader = indexReader;
        }
    
        public override Query GetPrefixQuery(string field, string termStr)
        {
            for(string longestStr = termStr; longestStr.Length>2; longestStr = longestStr.Substring(0,longestStr.Length-1))
            {
                TermEnum te = _reader.Terms(new Term(field, longestStr));
                Term term = te.Term();
                te.Close();
                if (term != null && term.Field() == field && term.Text().StartsWith(longestStr))
                {
                    return base.GetPrefixQuery(field, longestStr);
                }
            }
    
            return base.GetPrefixQuery(field, termStr);
        }
    }
    

    you can also try to call your analyzer in GetPrefixQuery which is not called for PrefixQuerys

    TokenStream ts = _analyzer.TokenStream(field, new StringReader(termStr));
    Lucene.Net.Analysis.Token token = ts.Next();
    var termstring = token.TermText();
    ts.Close();
    return base.GetPrefixQuery(field, termstring);
    

    But, be aware that you can always find a case where the returned results are not correct. This is why Lucene doesn't take analyzers into account when using wildcards.

提交回复
热议问题