Regular expression: match start or whitespace

前端 未结 8 1589
无人共我
无人共我 2020-11-29 06:32

Can a regular expression match whitespace or the start of a string?

I\'m trying to replace currency the abbreviation GBP with a £ symbol. I

相关标签:
8条回答
  • 2020-11-29 06:51

    You can always trim leading and trailing whitespace from the token before you search if it's not a matching/grouping situation that requires the full line.

    0 讨论(0)
  • 2020-11-29 06:56

    A left-hand whitespace boundary - a position in the string that is either a string start or right after a whitespace character - can be expressed with

    (?<!\S)   # A negative lookbehind requiring no non-whitespace char immediately to the left of the current position
    (?<=\s|^) # A positive lookbehind requiring a whitespace or start of string immediately to the left of the current position
    (?:\s|^)  # A non-capturing group matching either a whitespace or start of string 
    (\s|^)    # A capturing group matching either a whitespace or start of string
    

    See a regex demo. Python 3 demo:

    import re
    rx = r'(?<!\S)GBP([\W\d])'
    text = 'GBP 5 Off when you spend GBP75.00'
    print( re.sub(rx, r'£\1', text) )
    # => £ 5 Off when you spend £75.00
    

    Note you may use \1 instead of \g<1> in the replacement pattern since there is no need in an unambiguous backreference when it is not followed with a digit.

    BONUS: A right-hand whitespace boundary can be expressed with the following patterns:

    (?!\S)   # A negative lookahead requiring no non-whitespace char immediately to the right of the current position
    (?=\s|$) # A positive lookahead requiring a whitespace or end of string immediately to the right of the current position
    (?:\s|$)  # A non-capturing group matching either a whitespace or end of string 
    (\s|$)    # A capturing group matching either a whitespace or end of string
    
    0 讨论(0)
  • 2020-11-29 06:59

    It works in Perl:

    $text = 'GBP 5 off when you spend GBP75';
    $text =~ s/(\W|^)GBP([\W\d])/$1\$$2/g;
    printf "$text\n";
    

    The output is:

    $ 5 off when you spend $75
    

    Note that I stipulated that the match should be global, to get all occurrences.

    0 讨论(0)
  • 2020-11-29 07:01

    I think you're looking for '(^|\W)GBP([\W\d])'

    0 讨论(0)
  • 2020-11-29 07:02

    \b is word boundary, which can be a white space, the beginning of a line or a non-alphanumeric symbol (\bGBP\b).

    0 讨论(0)
  • 2020-11-29 07:02

    This replaces GBP if it's preceded by the start of a string or a word boundary (which the start of a string already is), and after GBP comes a numeric value or a word boundary:

    re.sub(u'\bGBP(?=\b|\d)', u'£', text)
    

    This removes the need for any unnecessary backreferencing by using a lookahead. Inclusive enough?

    0 讨论(0)
提交回复
热议问题