Regex word boundary expressions

前端 未结 4 401
时光说笑
时光说笑 2020-11-28 11:44

Say for example I have the following string \"one two(three) (three) four five\" and I want to replace \"(three)\" with \"(four)\" but

相关标签:
4条回答
  • 2020-11-28 12:23

    Here a simple code you may be interested in:

        string pattern = @"\b" + find + @"\b";
        Regex.Replace(stringToSearch, pattern, replace, RegexOptions.IgnoreCase);
    

    Source code: snip2code - C#: Replace an exact word in a sentence

    0 讨论(0)
  • Your problem stems from a misunderstanding of what \b actually means. Admittedly, it is not obvious.

    The reason \b\(three\)\b doesn’t match the threes in your input string is the following:

    • \b means: the boundary between a word character and a non-word character.
    • Letters (e.g. a-z) are considered word characters.
    • Punctuation marks such as ( are considered non-word characters.

    Here is your input string again, stretched out a bit, and I’ve marked the places where \b matches:

     o n e   t w o ( t h r e e )   ( t h r e e )   f o u r   f i v e
    ↑     ↑ ↑     ↑ ↑         ↑     ↑         ↑   ↑       ↑ ↑       ↑
    

    As you can see here, there is a \b between “two” and “(three)”, but not before the second “(three)”.

    The moral of the story? “Whole-word search” doesn’t really make much sense if what you’re searching for is not just a word (a string of letters). Since you have punctuation characters (parentheses) in your search string, it is not as such a “word”. If you searched for a word consisting only of word characters, then \b would do what you expect.

    You can, of course, use a different Regex to match the string only if it surrounded by spaces or occurs at the beginning or end of the string:

    (^|\s)\(three\)(\s|$)
    

    However, the problem with this is, of course, that if you search for “three” (without the parentheses), it won’t find the one in “(three)” because it doesn’t have spaces around it, even though it is actually a whole word.

    I think most text editors (including Visual Studio) will use \b only if your search string actually starts and/or ends with a word character:

    var pattern = Regex.Escape(searchString);
    if (Regex.IsMatch(searchString, @"^\w"))
        pattern = @"\b" + pattern;
    if (Regex.IsMatch(searchString, @"\w$"))
        pattern = pattern + @"\b";
    

    That way they will find “(three)” even if you select “whole words only”.

    0 讨论(0)
  • 2020-11-28 12:31

    I recently came across a similar issue in javascript trying to match terms with a leading '$' character only as separate words, e.g. if $hot = 'FUZZ', then:

    "some $hot $hotel bird$hot pellets" ---> "some FUZZ $hotel bird$hot pellets"
    

    The regex /\b\$hot\b/g (my first guess) did not work for the same reason the parens did not match in the original question — as non word characters, there is no word/non-word boundary preceding them with whitespace or a string start.

    However the regex /\B\$hot\b/g does match, which shows that the positions not marked in @timwi's excellent example match the \B term. This was not intuitive to me because ") (" is not made of regex word characters. But I guess since \B is an inversion of the \b class, it doesn't have to be word characters, it just has to be not- not- word characters :)

    0 讨论(0)
  • 2020-11-28 12:38

    As Gopi said, but (theoretically) catching only (three) not two(three):

    string input = "one two(three) (three) four five";
    
    string output = input.Replace(" (three) ", " (four) ");
    

    When I test that, I get: "one two(three) (four) four five" Just remember that white-space is a string character, too, so it can also be replaced. If I did this:

    //use same input
    string output = input.Replace(" ", ";");
    

    I'd get one;two(three);(three);four;five"

    0 讨论(0)
提交回复
热议问题