Remove Open and Close Parentheses from Sphinx Index

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-11 16:49:14

问题


I have a regexp_filter that looks for a pattern in my documents e..g

regexp_filter=Bob Smith=>Robert Smith

However I've found this does not work when the pattern text is inside parentheses e.g..

he and my boss (Bob Smith) were due to..

I have tried a few things to get rid of the (

  1. Added ( to the Stopwords
  2. Added a custom charset that does NOT include parens

But regardless patterns are not matched when they are inside parentheses.

Is there anyway to do this correctly?

Update is that precisely the same thing happens with hyphens. Even if I explicitly remove them in Stopwords or Charset or even make a regexp to remove them

regexp_filter=-=>

They get indexed and break any regexps especially with word boundaries.

So:

regepx_filter=\bBob\b=>Robert

Fails in text like 'Recipient: Bob-Mark-John`

`


回答1:


If you add parentheses to charset_table(means it's a valid char just like 'a'), means (Bob Smith) becomes (Robert Smith). 'Robert Smith' will not match '(Robert Smith)'. You can have this match only if you enable infixing and do a wildcard search (like '*Robert Smith*').

You should add special chars to chartset_table only if you know for sure you need them as valid characters used to construct words.



来源:https://stackoverflow.com/questions/50204475/remove-open-and-close-parentheses-from-sphinx-index

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!