How to do case insensitive sorting of Norwegian characters (Æ, Ø, and Å) using Hibernate Lucene Search?

前端 未结 2 957
说谎
说谎 2021-01-18 09:01

æ, ø, å are latest letters in the norwegian alphabet

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Æ Ø Å

When we try to sort it usin

相关标签:
2条回答
  • 2021-01-18 09:46

    You can use org.apache.lucene.collation.CollationKeyFilter class in Hibernate Search version 4.3.0.Final. Create your own collation filter factory:

    import org.apache.lucene.analysis.TokenStream;
    import org.apache.lucene.collation.CollationKeyFilter;
    import org.apache.solr.analysis.BaseTokenFilterFactory;
    
    import java.text.Collator;
    import java.util.Locale;
    
    public final class NorwegianCollationFactory extends BaseTokenFilterFactory {
    
        @Override
        public TokenStream create(TokenStream input) {
            Collator norwegianCollator = Collator.getInstance(new Locale("no", "NO"));
            return new CollationKeyFilter(input, norwegianCollator);
        }
    
    }
    

    And the use this collation factory in your AnalyzerDef:

    @AnalyzerDef(name = "myOwnAnalyzer",
    tokenizer = @TokenizerDef(factory = KeywordTokenizerFactory.class),
    filters = {
        @TokenFilterDef(factory = ASCIIFoldingFilterFactory.class),
        @TokenFilterDef(factory = LowerCaseFilterFactory.class),
        @TokenFilterDef(factory = PatternReplaceFilterFactory.class, params = {
            @Parameter(name = "pattern", value = "('-&\\.,\\(\\))"),
            @Parameter(name = "replacement", value = " "),
            @Parameter(name = "replace", value = "all")
        }),
        @TokenFilterDef(factory = PatternReplaceFilterFactory.class, params = {
            @Parameter(name = "pattern", value = "([^0-9\\p{L} ])"),
            @Parameter(name = "replacement", value = ""),
            @Parameter(name = "replace", value = "all")
        }),
        @TokenFilterDef(factory = TrimFilterFactory.class)
    ,
        @TokenFilterDef(factory = NorwegianCollationFactory .class)
    }
    )
    public class KikaPaya implements Serializable {
    

    More information about using this Collation filter with hibernate search version 5 - https://stackoverflow.com/a/60738067/7179509

    0 讨论(0)
  • 2021-01-18 09:51

    I must admit it's not something common. As far as I can see, there is a Lucene module which uses ICU for locale dependent sorting.

    See the lucene-icu artifact and especially the ICUCollationKeyFilter and ICUCollationKeyAnalyzer (the analyzer is a KeywordTokenizer with the filter). You will need to create the factory necessary to use it with Hibernate Search but it should be quite easy.

    Can't really promise it will work but it's probably your best bet.

    0 讨论(0)
提交回复
热议问题