问题
I have a regexp_filter
that looks for a pattern in my documents e..g
regexp_filter=Bob Smith=>Robert Smith
However I've found this does not work when the pattern text is inside parentheses e.g..
he and my boss (Bob Smith) were due to..
I have tried a few things to get rid of the (
- Added
(
to the Stopwords - Added a custom charset that does NOT include parens
But regardless patterns are not matched when they are inside parentheses.
Is there anyway to do this correctly?
Update is that precisely the same thing happens with hyphens. Even if I explicitly remove them in Stopwords
or Charset
or even make a regexp to remove them
regexp_filter=-=>
They get indexed and break any regexps especially with word boundaries.
So:
regepx_filter=\bBob\b=>Robert
Fails in text like 'Recipient: Bob-Mark-John`
`
回答1:
If you add parentheses to charset_table(means it's a valid char just like 'a'), means (Bob Smith) becomes (Robert Smith).
'Robert Smith
' will not match '(Robert Smith)
'. You can have this match only if you enable infixing and do a wildcard search (like '*Robert Smith*
').
You should add special chars to chartset_table only if you know for sure you need them as valid characters used to construct words.
来源:https://stackoverflow.com/questions/50204475/remove-open-and-close-parentheses-from-sphinx-index