IRavenQueryable<T> Search can't find &-sign (or other special characters)

孤人 提交于 2019-12-13 04:47:46

问题


In our own RavenQueryableExtensions class we have the following method:

public static IRavenQueryable<T> SearchMultiple<T>(this IRavenQueryable<T> self,
    Expression<Func<T, object>> fieldSelector, string queries,
    decimal boost = 1, SearchOptions options = SearchOptions.Or)
{
    if (String.IsNullOrEmpty(queries)) throw new ArgumentNullException("queries");

    // More than two spaces or tabs are replaced with a single space
    var newQueries = Regex.Replace(queries, @"\s{2,}", " ");
    // not important for this question:
    //newQueries = SyncShared.ReplacePostcode(newQueries);
    // Splits the search-string into separate search-terms
    var searchValues = newQueries.Split(' ');

    return self.SearchMultiple(fieldSelector, searchValues, boost, options);
}

public static IRavenQueryable<T> SearchMultiple<T>(this IRavenQueryable<T> self,
    Expression<Func<T, object>> fieldSelector, IEnumerable<string> queries,
    decimal boost = 1, SearchOptions options = SearchOptions.Or)
{
    if (queries == null) throw new ArgumentNullException("queries");

    return queries.Aggregate(self, (current, query) => current.Search(fieldSelector, query + "* ", boost, options, EscapeQueryOptions.AllowPostfixWildcard));
}

Which creates a search-query with all the loose search-terms in the searchValues-array. However, it seems to not recognize the special characters like & or .. For example:

  • We have a list with Companies. One of the companies has the name "A & A something more".
  • When I enter the search "A & A" or just "&" it doesn't find this company.
  • It generated the following Query when I Debug: {Query:(A*) AND Query:(\&*) AND Query:(A*)}
  • The same goes when I enter any other special character like "." or "`".

Does anyone know how to alter the Search-method so it correctly formats these special characters?

Also, I don't know if it's relevant for our issue, but we also use a AsciiFoldingAnalyzer class (see below). This class allows us to also search companies with characters like "é" or "ü" when we just enter "e" or "u".

using System.Collections.Generic;
using System.IO;
using Lucene.Net.Analysis;
using Lucene.Net.Analysis.Standard;
using Lucene.Net.Util;

namespace NatWa.MidOffice.RavenDb.ServerGoodies
{
    public class AsciiFoldingAnalyser : StandardAnalyzer
    {
        public AsciiFoldingAnalyser(Version matchVersion)
            : base(matchVersion)
        {
        }

        public AsciiFoldingAnalyser(Version matchVersion, ISet<string> stopWords)
            : base(matchVersion, stopWords)
        {
        }

        public AsciiFoldingAnalyser(Version matchVersion, FileInfo stopwords)
            : base(matchVersion, stopwords)
        {
        }

        public AsciiFoldingAnalyser(Version matchVersion, TextReader stopwords)
            : base(matchVersion, stopwords)
        {
        }

        public override TokenStream TokenStream(string fieldName, TextReader reader)
        {
            return new LowerCaseFilter(new ASCIIFoldingFilter(base.TokenStream(fieldName, reader)));
        }
    }
}

And we use it in our Mappings like so:

public class UserLijst : AbstractIndexCreationTask<UserState, UserLijstResult>
{
    public UserLijst()
    {
        Map = states => from state in states
                        select new UserLijstResult
                        {
                            Id = (UserId)state.AggregateId,
                            Naam = state.Naam,
                            Query = new object[]
                            {
                                state.Naam
                            }
                        };

        Reduce = results => from result in results
                            group result by new { result.Id } into g
                            select new UserLijstResult
                            {
                                Id = g.Key.Id,
                                Naam = g.First().Naam,
                                Query = g.First().Query
                            };

        Index("Query", FieldIndexing.Analyzed);
        Analyze(result => result.Query, typeof(AsciiFoldingAnalyser).AssemblyQualifiedName);
    }
}

回答1:


Ok, it turned out it was pretty easy. We were using a base Tokenizer in our Analyser, which filters out all special characters and characters with a length of 1. When we replaced

public override TokenStream TokenStream(string fieldName, TextReader reader)
{
    return new LowerCaseFilter(new ASCIIFoldingFilter(base.TokenStream(fieldName, reader)));
}

in our AsciiFoldingAnalyser for:

public override TokenStream TokenStream(string fieldName, TextReader reader)
{
    return new LowerCaseFilter(new ASCIIFoldingFilter(new WhitespaceTokenizer(reader)));
}

it works. We can search for the special characters.

We now get A LOT of results for a search like "A & A", since it finds all occurrences of the characters "a" and "&" in all indexed fields, so perhaps we need to alter a few more things to narrow this down a bit, but at least I got what I want when asking this question.



来源:https://stackoverflow.com/questions/28338943/iravenqueryablet-search-cant-find-sign-or-other-special-characters

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!