Don't split on underscore with solr.StandardTokenizerFactory

匿名 (未验证) 提交于 2019-12-03 00:56:02

问题:

I'm using solr, I'm using StandardTokenizerFactory in the text field but I don't want to split on the underscore. Do I have to use another toknizer like PatternTokenizerFactory or I can do this with StandardTokenizerFactory ? as I need the same functionality of StandardTokenizerFactory but without split on underscore.

回答1:

I don't think you can do it in StandardTokenizerFactory. One solution is to first replace underscores with something the StandardTokenizerFactory won't process and something your documents won't otherwise contain. For example, you can first replace _ with QQ everywhere with PatternReplaceCharFilterFactory and pass through StandardTokenizerFactory and then replace QQ with _ using PatternReplaceFilterFactory. Here is the fieldType definition to do it:

<fieldType name="text_std_prot" class="solr.TextField" positionIncrementGap="100">     <analyzer>         <charFilter class="solr.PatternReplaceCharFilterFactory"                      pattern="_"                      replacement="QQ"/>         <tokenizer class="solr.StandardTokenizerFactory"/>         <filter class="solr.PatternReplaceFilterFactory"                  pattern="QQ"                  replacement="_"/>         ...     </analyzer> </fieldType> 

And here is a screen shot of what happens:



回答2:

Adding just following seems to do trick for StandardTokenizerFactory as StandardTokenizerFactory splits at hyphen "-".

<charFilter class="solr.PatternReplaceCharFilterFactory"                      pattern="_"                      replacement="-"/>       <tokenizer class="solr.StandardTokenizerFactory"/> 


易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!