In Umbraco, I use Examine to search in the website but the content is in french. Everything works fine except when I search for "Français" it's not the same result as "Francais". Is there a way to ignore those french characters? I try to find a FrenchAnalyser for Leucene/Examine but did not found anything. I use Fuzzy so it return results even if the words is not the same.
Here's the code of my search :
public static ISearchResults Search(string searchTerm)
{
var provider = ExamineManager.Instance.SearchProviderCollection["ExternalSearcher"];
var criteria = provider.CreateSearchCriteria(BooleanOperation.Or);
var crawl = criteria.GroupedOr(BoostedSearchableFields, searchTerm.Boost(15))
.Or().GroupedOr(BoostedSearchableFields, searchTerm.Fuzzy(Fuzziness))
.Or().GroupedOr(SearchableFields, searchTerm.Fuzzy(Fuzziness))
.Not().Field("umbracoNavHide", "1");
return provider.Search(crawl.Compile());
}
We ended up using a custom analyer based on the SnowballAnalyzer
public class CustomAnalyzer : SnowballAnalyzer
{
public CustomAnalyzer() : base("French") { }
public override TokenStream TokenStream(string fieldName, TextReader reader)
{
TokenStream result = base.TokenStream(fieldName, reader);
result = new ISOLatin1AccentFilter(result);
return result;
}
}
Try using Regex like this below:
var strInput ="Français";
var strToReplace = string.Empty;
var sNewString = Regex.Replace(strInput, "[^A-Za-z0-9]", strToReplace);
I've used this pattern "[^A-Za-z0-9]" to replace all non-alphanumeric string with a blank.
Hope it helps.
来源:https://stackoverflow.com/questions/21312632/ignore-special-characters-in-examine