Rails sunspot-solr - words with hyphen

谁说胖子不能爱 提交于 2019-12-06 04:51:11
polmiro

The problem is that solr.StandardTokenizerFactory is splitting words by hyphens so "e-tron" generates the tokens "e", "tron". Presumably "e" is lost as solr.TextField filters with a minimum token size of 2.

This is one example that would show your specific problem.

<fieldType name="text" class="solr.TextField" omitNorms="false">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.WordDelimiterFilterFactory" preserveOriginal="1" />
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" side="front"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.WordDelimiterFilterFactory" preserveOriginal="1" />
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>
  1. solr.WhitespaceTokenizerFactory will generate tokens on whitespace. ["e-tron"]
  2. solr.WordDelimiterFilterFactory will split on hyphens but also preserve the original word. ["e", "tron", "e-tron"]
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!