Solr - synonyms containing multiple words

后端 未结 4 827
佛祖请我去吃肉
佛祖请我去吃肉 2020-12-30 07:52

Quick question, I don\'t know how to deal with synonyms which contains a space! I have the following config:

The SOLR config file



        
相关标签:
4条回答
  • 2020-12-30 08:22

    You are doing explicit mapping with =>.

    The Solr documentation says

    Explicit mappings match any token sequence on the LHS of "=>" and replace with all alternatives on the RHS. These types of mappings ignore the expand parameter in the schema.

    So I am guessing that if you search for NYC you get nothing back, since it got replaced with New York at index time.

    Instead, can you try declaring them as equivalent synonyms? i.e. like NYC, New York instead of NYC => New York.

    Then I believe you can search for either of them and the result will be the same.

    0 讨论(0)
  • 2020-12-30 08:30

    About

    st., st => saint
    

    I think you should do it that way :

    st. => saint
    st => saint
    

    About

    NY => New York
    

    I'm facing a similar issue and came to the conclusion that it's because parsing is done BEFORE synonym replacement, which is likely causing a problem when multi word. I found that it is possible to include a parser into SynonymFactory :

    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" tokenizerFactory="solr.KeywordTokenizerFactory" /> 
    

    I just tested it I got much better results, but not yet the one expected. Strangely enough, when adding KeywordTokenizerFactory seems to impact positvely, adding WhitespaceTokenizerFactory or StandardTokenizerFactory doesn't seem to change anything.

    BTW, if not using shingles, this should already be fine.

    0 讨论(0)
  • 2020-12-30 08:38

    basing on Pr Shadoko's answer:

    Look the way your analyzer works, e.g. with

    http://localhost/solr/analysis/field?analysis.fieldvalue=EXAMPLE-KEYWORDS&q=EXAMPLE-KEYWORD%203&analysis.fieldname=EXAMPLEFIELD&analysis.showmatch=true
    

    analysis/field is an out-of-the-box request handler (seated in solrconfig.xml). Here you find its parameter list. ("analysis.query" doesn't work for me, so I had to use "q")

    As the SynonymFilter parse (cuts) the incoming text BEFORE matching any synonym, the multi-word synonyms won't get a hit. The trick is to tell the SynonymFilter to take a parser, which actually doesn't parse: the keywordTokenizer

    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" tokenizerFactory="solr.KeywordTokenizerFactory" />
    

    Anyhow, this approach feels like a hack and I can't estimate the side-effects (scalability, ...) - so be careful!

    0 讨论(0)
  • 2020-12-30 08:44

    The problem is that solr synonyms tend to cause issues when the number of words in the first phrase is less than the number of words in the second phrase. When this happens, tokens overflow into the positions of other tokens.

    I have a workaround for this problem, but it requires two uses of solr.SynonymFilterFactory at index and query time.

    Like this :

    <filter class="solr.SynonymFilterFactory" synonyms="multi_word_conversion.txt" 
    ignoreCase="true" expand="true" />
    
    <filter class="solr.SynonymFilterFactory" synonyms="layor_two_syns.txt" 
    ignoreCase="true" expand="true"/>
    

    In the first filter you will have: New York => New_York

    In the second filter: NYC => New_York

    Now a search for New York will return results containing NYC and vice verses.

    On a final note: This will method will not work unless it is at index and query time.

    0 讨论(0)
提交回复
热议问题