Do Python regexes support something like Perl's \G?

后端未结

关注

 5  1290

I have a Perl regular expression (shown here, though understanding the whole thing isn\'t hopefully necessary to answering this question) that contains the \\G metacharacter

相关标签:

5条回答

野的像风

2020-12-02 02:42
I know I'm little late, but here's an alternative to the \G approach:
```
import re

def replace(match):
    if match.group(0)[0] == '/': return match.group(0)
    else: return '<' + match.group(0) + '>'

source = '''http://a.com http://b.com
//http://etc.'''

pattern = re.compile(r'(?m)^//.*$|http://\S+')
result = re.sub(pattern, replace, source)
print(result)
```
output (via Ideone):
```
<http://a.com> <http://b.com>
//http://etc.
```
The idea is to use a regex that matches both kinds of string: a URL or a commented line. Then you use a callback (delegate, closure, embedded code, etc.) to find out which one you matched and return the appropriate replacement string.

As a matter of fact, this is my preferred approach even in flavors that do support \G. Even in Java, where I have to write a bunch of boilerplate code to implement the callback.

(I'm not a Python guy, so forgive me if the code is terribly un-pythonic.)
0 讨论(0)
发布评论:

提交评论
- 加载中...

半阙折子戏

2020-12-02 02:47

You can use re.match to match anchored patterns. re.match will only match at the beginning (position 0) of the text, or where you specify.

def match_sequence(pattern,text,pos=0):
  pat = re.compile(pattern)
  match = pat.match(text,pos)
  while match:
    yield match
    if match.end() == pos:
      break # infinite loop otherwise
    pos = match.end()
    match = pat.match(text,pos)

This will only match pattern from the given position, and any matches that follow 0 characters after.

>>> for match in match_sequence(r'[^\W\d]+|\d+',"he11o world!"):
...   print match.group()
...
he
11
o

0 讨论(0)

滥情空心

2020-12-02 02:49

Python does not have the /g modifier for their regexen, and so do not have the \G regex token. A pity, really.

0 讨论(0)
发布评论:

提交评论
- 加载中...
北恋

2020-12-02 02:51
Don't try to put everything into one expression as it become very hard to read, translate (as you see for yourself) and maintain.
```
import re
lines = [re.sub(r'http://[^\s]+', r'<\g<0>>', line) for line in text_block.splitlines() if not line.startedwith('//')]
print '\n'.join(lines)
```
Python is not usually best when you literally translate from Perl, it has it's own programming patterns.
0 讨论(0)
发布评论:

提交评论
- 加载中...

你的背包

2020-12-02 02:56

Try these:

import re
re.sub()
re.findall()
re.finditer()

for example:

# Finds all words of length 3 or 4
s = "the quick brown fox jumped over the lazy dogs."
print re.findall(r'\b\w{3,4}\b', s)

# prints ['the','fox','over','the','lazy','dogs']

0 讨论(0)