Do Python regexes support something like Perl's \G?

后端 未结 5 1290
遥遥无期
遥遥无期 2020-12-02 02:00

I have a Perl regular expression (shown here, though understanding the whole thing isn\'t hopefully necessary to answering this question) that contains the \\G metacharacter

相关标签:
5条回答
  • 2020-12-02 02:42

    I know I'm little late, but here's an alternative to the \G approach:

    import re
    
    def replace(match):
        if match.group(0)[0] == '/': return match.group(0)
        else: return '<' + match.group(0) + '>'
    
    source = '''http://a.com http://b.com
    //http://etc.'''
    
    pattern = re.compile(r'(?m)^//.*$|http://\S+')
    result = re.sub(pattern, replace, source)
    print(result)
    

    output (via Ideone):

    <http://a.com> <http://b.com>
    //http://etc.
    

    The idea is to use a regex that matches both kinds of string: a URL or a commented line. Then you use a callback (delegate, closure, embedded code, etc.) to find out which one you matched and return the appropriate replacement string.

    As a matter of fact, this is my preferred approach even in flavors that do support \G. Even in Java, where I have to write a bunch of boilerplate code to implement the callback.

    (I'm not a Python guy, so forgive me if the code is terribly un-pythonic.)

    0 讨论(0)
  • 2020-12-02 02:47

    You can use re.match to match anchored patterns. re.match will only match at the beginning (position 0) of the text, or where you specify.

    def match_sequence(pattern,text,pos=0):
      pat = re.compile(pattern)
      match = pat.match(text,pos)
      while match:
        yield match
        if match.end() == pos:
          break # infinite loop otherwise
        pos = match.end()
        match = pat.match(text,pos)
    

    This will only match pattern from the given position, and any matches that follow 0 characters after.

    >>> for match in match_sequence(r'[^\W\d]+|\d+',"he11o world!"):
    ...   print match.group()
    ...
    he
    11
    o
    
    0 讨论(0)
  • 2020-12-02 02:49

    Python does not have the /g modifier for their regexen, and so do not have the \G regex token. A pity, really.

    0 讨论(0)
  • 2020-12-02 02:51

    Don't try to put everything into one expression as it become very hard to read, translate (as you see for yourself) and maintain.

    import re
    lines = [re.sub(r'http://[^\s]+', r'<\g<0>>', line) for line in text_block.splitlines() if not line.startedwith('//')]
    print '\n'.join(lines)
    

    Python is not usually best when you literally translate from Perl, it has it's own programming patterns.

    0 讨论(0)
  • 2020-12-02 02:56

    Try these:

    import re
    re.sub()
    re.findall()
    re.finditer()
    

    for example:

    # Finds all words of length 3 or 4
    s = "the quick brown fox jumped over the lazy dogs."
    print re.findall(r'\b\w{3,4}\b', s)
    
    # prints ['the','fox','over','the','lazy','dogs']
    
    0 讨论(0)
提交回复
热议问题