ElasticSearch - define custom letter order for sorting

守給你的承諾、 提交于 2019-12-22 13:49:07

问题


I'm using ElasticSearch 2.4.2 (via HibernateSearch 5.7.1.Final from Java).

I have a problem with string sorting. The language of my application has diacritics, which have a specific alphabetic ordering. For example Ł goes directly after L, Ó goes after O, etc. So you are supposed to sort the strings like this:

 Dla
 Dła
 Doa
 Dóa
 Dza
 Eza

ElasticSearch sorts by typical letters first, and moves all strange letters to at the end:

 Dla
 Doa
 Dza
 Dła
 Dóa
 Eza

Can I add a custom letter ordering for ElasticSearch? Maybe there are some plugins for this? Do I need to write my own plugin? How do I start?

I found a plugin for Polish language for ElasticSearch, but as I understand it is for analysing, and analysing is not a solution in my case, because it will ignore diacritics and leave words with L and Ł mixed:

 Dla
 Dłb
 Dlc

This would sometimes be acceptable, but is not acceptable in my specific usecase.

I will be grateful for any remarks on this.


回答1:


I've never used it, but there is a plugin that could fit your needs: the ICU collation plugin.

You will have to use the icu_collation token filter, which will turns the tokens into collation keys. For that reason you will need to use a separate @Field (e.g. myField_sort) in Hibernate Search.

You can assign a specific analyzer to your field with @Field(name = "myField_sort", analyzer = @Analyzer(definition = "myCollationAnalyzer")), and define this analyzer (type, parameters) with something like that on one of your entities:

@Entity
@Indexed
@AnalyzerDef(
    name = "myCollationAnalyzer",
    filters = {
        @TokenFilterDef(
            name = "polish_collation",
            factory = ElasticsearchTokenFilterFactory.class,
            params = {
                @Parameter(name = "type", value = "'icu_collation'"),
                @Parameter(name = "language", value = "'pl'")
            }
        )
    }
)
public class MyEntity {

See the documentation for more information: https://docs.jboss.org/hibernate/stable/search/reference/en-US/html_single/#_custom_analyzers

It's admittedly a bit clumsy right now, but analyzer configuration will get a bit cleaner in the next Hibernate Search version with normalizers and analyzer definition providers.

Note: as usual, your field will need to be declared as sortable (@SortableField(forField = "myField_sort")).



来源:https://stackoverflow.com/questions/45598424/elasticsearch-define-custom-letter-order-for-sorting

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!