问题
I'm using ElasticSearch 2.4.2 (via HibernateSearch 5.7.1.Final from Java).
I have a problem with string sorting.
The language of my application has diacritics, which have a specific alphabetic
ordering. For example Ł
goes directly after L
, Ó
goes after O
, etc.
So you are supposed to sort the strings like this:
Dla
Dła
Doa
Dóa
Dza
Eza
ElasticSearch sorts by typical letters first, and moves all strange letters to at the end:
Dla
Doa
Dza
Dła
Dóa
Eza
Can I add a custom letter ordering for ElasticSearch? Maybe there are some plugins for this? Do I need to write my own plugin? How do I start?
I found a plugin for Polish language for ElasticSearch,
but as I understand it is for analysing, and analysing is not a solution
in my case, because it will ignore diacritics and leave words with L
and Ł
mixed:
Dla
Dłb
Dlc
This would sometimes be acceptable, but is not acceptable in my specific usecase.
I will be grateful for any remarks on this.
回答1:
I've never used it, but there is a plugin that could fit your needs: the ICU collation plugin.
You will have to use the icu_collation
token filter, which will turns the tokens into collation keys. For that reason you will need to use a separate @Field
(e.g. myField_sort
) in Hibernate Search.
You can assign a specific analyzer to your field with @Field(name = "myField_sort", analyzer = @Analyzer(definition = "myCollationAnalyzer"))
, and define this analyzer (type, parameters) with something like that on one of your entities:
@Entity
@Indexed
@AnalyzerDef(
name = "myCollationAnalyzer",
filters = {
@TokenFilterDef(
name = "polish_collation",
factory = ElasticsearchTokenFilterFactory.class,
params = {
@Parameter(name = "type", value = "'icu_collation'"),
@Parameter(name = "language", value = "'pl'")
}
)
}
)
public class MyEntity {
See the documentation for more information: https://docs.jboss.org/hibernate/stable/search/reference/en-US/html_single/#_custom_analyzers
It's admittedly a bit clumsy right now, but analyzer configuration will get a bit cleaner in the next Hibernate Search version with normalizers and analyzer definition providers.
Note: as usual, your field will need to be declared as sortable (@SortableField(forField = "myField_sort")
).
来源:https://stackoverflow.com/questions/45598424/elasticsearch-define-custom-letter-order-for-sorting