I have a Perl regular expression (shown here, though understanding the whole thing isn\'t hopefully necessary to answering this question) that contains the \\G metacharacter
I know I'm little late, but here's an alternative to the \G
approach:
import re
def replace(match):
if match.group(0)[0] == '/': return match.group(0)
else: return '<' + match.group(0) + '>'
source = '''http://a.com http://b.com
//http://etc.'''
pattern = re.compile(r'(?m)^//.*$|http://\S+')
result = re.sub(pattern, replace, source)
print(result)
output (via Ideone):
<http://a.com> <http://b.com>
//http://etc.
The idea is to use a regex that matches both kinds of string: a URL or a commented line. Then you use a callback (delegate, closure, embedded code, etc.) to find out which one you matched and return the appropriate replacement string.
As a matter of fact, this is my preferred approach even in flavors that do support \G
. Even in Java, where I have to write a bunch of boilerplate code to implement the callback.
(I'm not a Python guy, so forgive me if the code is terribly un-pythonic.)
You can use re.match
to match anchored patterns. re.match
will only match at the beginning (position 0) of the text, or where you specify.
def match_sequence(pattern,text,pos=0):
pat = re.compile(pattern)
match = pat.match(text,pos)
while match:
yield match
if match.end() == pos:
break # infinite loop otherwise
pos = match.end()
match = pat.match(text,pos)
This will only match pattern from the given position, and any matches that follow 0 characters after.
>>> for match in match_sequence(r'[^\W\d]+|\d+',"he11o world!"):
... print match.group()
...
he
11
o
Python does not have the /g modifier for their regexen, and so do not have the \G regex token. A pity, really.
Don't try to put everything into one expression as it become very hard to read, translate (as you see for yourself) and maintain.
import re
lines = [re.sub(r'http://[^\s]+', r'<\g<0>>', line) for line in text_block.splitlines() if not line.startedwith('//')]
print '\n'.join(lines)
Python is not usually best when you literally translate from Perl, it has it's own programming patterns.
Try these:
import re
re.sub()
re.findall()
re.finditer()
for example:
# Finds all words of length 3 or 4
s = "the quick brown fox jumped over the lazy dogs."
print re.findall(r'\b\w{3,4}\b', s)
# prints ['the','fox','over','the','lazy','dogs']