German 'ue' -> 'u' conversion in Lucene

北战南征 提交于 2019-12-24 11:18:05

问题


I have two questions regarding handling German umlauts in Lucene:

  1. I'm trying to find a way to convert German Umlauts written as 'ue', 'ae', etc to folded form 'u', 'a' and so on. This is done by GermanAnalyzer (and German2StemFilter used by it), but unfortunately it also does stemming which is very undesired in my case. Is there any other filter that can do only the 'ue' -> 'u' conversion?

  2. Is there any filter that does 'ü' -> 'ue' (NOT 'u' like ASCIIFoldingFilter does) conversion? What I'm trying to achieve is that word "über" should be found in the index whenever the user searches for " über" or "ueber" , but NOT "uber".


回答1:


german2's algorithm but without the stemming:

https://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/de/GermanNormalizationFilter.html




回答2:


you can use MappingCharFilterFactory and provide your own mapping file where you can choose to do whatever you want, like 'ü' -> 'ue'



来源:https://stackoverflow.com/questions/13451276/german-ue-u-conversion-in-lucene

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!