Conditional construct not working in Python regex

后端 未结 2 1429
执念已碎
执念已碎 2021-01-29 06:08

I an a newbie in python and I want to use my regex in re.sub. I tried it on regex101 and it works. Somehow when I tried to use it on my python (version 3.6) it does

相关标签:
2条回答
  • 2021-01-29 06:44

    The problem is that you cannot use lookarounds in conditional constructs in a Python re. Only capturing group IDs to test if the previous group matched.

    (?(id/name)yes-pattern|no-pattern)
    Will try to match with yes-pattern if the group with given id or name exists, and with no-pattern if it doesn’t. no-pattern is optional and can be omitted.

    The (?(?=[^\t]*)([\t]+)) regex checks if there are 0+ chars other than tabs at the current location, and if yes, matches and captures 1 or more tabs. This makes no sense. If you want to match the first occurrence of 1 or more tabs, you may use re.sub with a mere "\t+" pattern and count=1 argument.

    import re
    reg = "\t+";
    s = 'a          bold, italic,           teletype';
    result = re.sub(reg, ',', s, count=1);
    print(result);
    

    See the Python demo

    0 讨论(0)
  • 2021-01-29 07:04

    I suppose you could do this:

    import re
    
    regex = r'(^\w*?[\t]+)'
    s = 'a      bold, italic,           teletype'
    
    def repl(match):
        s = match.group(0)
        return s.rstrip() + ', '
    
    print(re.sub(regex,repl, s))
    

    out

    a, bold, italic,            teletype
    

    Here we are capturing the beginning of the string through any tabs that may occur after the first word, and passing the match to a callable. The callable removes trailing tabs with rstrip and adds a trailing comma.

    Note: if the first tab occurs after the first word, it's not replaced. i.e. 'a bold, italic, teletype' is left unchanged. Is that what you want?

    0 讨论(0)
提交回复
热议问题