Regular expression: match word not between quotes

后端 未结 5 418
不知归路
不知归路 2021-01-15 00:25

I would like a Python regular expression that matches a given word that\'s not between simple quotes. I\'ve tried to use the (?! ...) but without success.

相关标签:
5条回答
  • 2021-01-15 00:33

    A regex solution below will work in most cases, but it might break if the unbalanced single quotes appear outside of string literals, e.g. in comments.

    A usual regex trick to match strings in-context is matching what you need to replace and match and capture what you need to keep.

    Here is a sample Python demo:

    import re
    rx = r"('[^'\\]*(?:\\.[^'\\]*)*')|\b{0}\b"
    s = r"""
        var foe = 10;
        foe = "";
        dark_vador = 'bad guy'
        foe = ' I\'m your father, foe ! '
        bar = thingy + foe"""
    toReplace = "foe"
    res = re.sub(rx.format(toReplace), lambda m: m.group(1) if m.group(1) else 'NEWORD', s)
    print(res)
    

    See the Python demo

    The regex will look like

    ('[^'\\]*(?:\\.[^'\\]*)*')|\bfoe\b
    

    See the regex demo.

    The ('[^'\\]*(?:\\.[^'\\]*)*') part captures ingle-quoted string literals into Group 1 and if it matches, it is just put back into the result, and \bfoe\b matches whole words foe in any other string context - and subsequently is replaced with another word.

    NOTE: To also match double quoted string literals, use r"('[^'\\]*(?:\\.[^'\\]*)*'|\"[^\"\\]*(?:\\.[^\"\\]*)*\")".

    0 讨论(0)
  • 2021-01-15 00:33

    Capture group 1 of the following regular expression will contain matches of 'foe'.

    r'^(?:[^'\n]|\\')*(?:(?<!\\)'(?:[^'\n]|\\')*(?:(?<!\\)')(?:[^'\n]|\\')*)*\b(foe)\b'
    

    Start your engine!

    Python's regex engine performs the following operations.

    ^           : assert beginning of string
    (?:         : begin non-capture group
      [^'\n]    : match any char other than single quote and line terminator
      |         : or
      \\'       : match '\' then a single quote
    )           : end non-capture group   
    *           : execute non-capture group 0+ times
    (?:         : begin non-capture group
      (?<!\\)   : next char is not preceded by '\' (negative lookbehind)
      '         : match single quote
      (?:       : begin non-capture group
        [^'\n]  : match any char other than single quote and line terminator
        |       : or
        \\'     : match '\' then a single quote
      )         : end non-capture group   
      *         : execute non-capture group 0+ times
      (?:       : begin non-capture group
        (?<!\\) : next char is not preceded by '\' (negative lookbehind)
        '       : match single quote
      )         : end non-capture group
      (?:       : begin non-capture group
        [^'\n]  : match any char other than single quote and line terminator
        |       : or
        \\'     : match '\' then a single quote
      )         : end non-capture group   
      *         : execute non-capture group 0+ times
    )           : end non-capture group
    *           : execute non-capture group 0+ times
    \b(foe)\b   : match 'foe' in capture group 1
    
    0 讨论(0)
  • 2021-01-15 00:35

    How about this regular expression:

    >>> s = '''var foe = 10;
    foe = "";
    dark_vador = 'bad guy'
    ' I\m your father, foe ! '
    bar = thingy + foe'''
    >>>
    >>> re.findall(r'(?!\'.*)foe(?!.*\')', s)
    ['foe', 'foe', 'foe']
    

    The trick here is to make sure the expression does not match any string with leading and trailing ' and to remember to account for the characters in between, thereafter .* in the re expression.

    0 讨论(0)
  • 2021-01-15 00:36

    ((?!\'[\w\s]*[\\']*[\w\s]*)foe(?![\w\s]*[\\']*[\w\s]*\'))
    
    0 讨论(0)
  • 2021-01-15 00:58

    You can try this:-

    ((?!\'[\w\s]*)foe(?![\w\s]*\'))

    0 讨论(0)
提交回复
热议问题