Rails sunspot-solr - words with hyphen

爷,独闯天下 提交于 2019-12-07 13:59:57

问题


I'm using the sunspot_rails gem and everything is working perfect so far but: I'm not getting any search results for words with a hyphen.

Example: The string "tron" returns a lot of results(the word mentioned in all articles is e-tron)

The string "e-tron" returns 0 results even though this is the correct word mentioned in all my articles.

My current schema.xml config:

    <fieldType name="text" class="solr.TextField" omitNorms="false">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" side="front"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

What I want: The behaviour for the search string tron is okay of course, but I also want to have the correct matches for the search string e-tron.


回答1:


The problem is that solr.StandardTokenizerFactory is splitting words by hyphens so "e-tron" generates the tokens "e", "tron". Presumably "e" is lost as solr.TextField filters with a minimum token size of 2.

This is one example that would show your specific problem.

<fieldType name="text" class="solr.TextField" omitNorms="false">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.WordDelimiterFilterFactory" preserveOriginal="1" />
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" side="front"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.WordDelimiterFilterFactory" preserveOriginal="1" />
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>
  1. solr.WhitespaceTokenizerFactory will generate tokens on whitespace. ["e-tron"]
  2. solr.WordDelimiterFilterFactory will split on hyphens but also preserve the original word. ["e", "tron", "e-tron"]


来源:https://stackoverflow.com/questions/17225344/rails-sunspot-solr-words-with-hyphen

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!