Reuse part of a Regex pattern

前端未结

关注

 6  1529

Consider this (very simplified) example string:

1aw2,5cx7

As you can see, it is two digit/letter/letter/digit values separated

相关标签:

6条回答

醉话见心

2020-11-28 14:40
Try using back referencing, i believe it works something like below to match
```
1aw2,5cx7
```
You could use
```
(\d\w\w\d),\1
```
See here for reference http://www.regular-expressions.info/backref.html
0 讨论(0)
发布评论:

提交评论
- 加载中...
余生分开走

2020-11-28 14:44
No, when using the standard library re module, regular expression patterns cannot be 'symbolized'.

You can always do so by re-using Python variables, of course:
```
digit_letter_letter_digit = r'\d\w\w\d'
```
then use string formatting to build the larger pattern:
```
match(r"{0},{0}".format(digit_letter_letter_digit), inputtext)
```
or, using Python 3.6+ f-strings:
```
dlld = r'\d\w\w\d'
match(fr"{dlld},{dlld}", inputtext)
```
I often do use this technique to compose larger, more complex patterns from re-usable sub-patterns.

If you are prepared to install an external library, then the regex project can solve this problem with a regex subroutine call. The syntax (?<digit>) re-uses the pattern of an already used (implicitly numbered) capturing group:
```
(\d\w\w\d),(?1)
^........^ ^..^
|           \
|             re-use pattern of capturing group 1  
\
  capturing group 1
```
You can do the same with named capturing groups, where (?<groupname>...) is the named group groupname, and (?&groupname), (?P&groupname) or (?P>groupname) re-use the pattern matched by groupname (the latter two forms are alternatives for compatibility with other engines).

And finally, regex supports the (?(DEFINE)...) block to 'define' subroutine patterns without them actually matching anything at that stage. You can put multiple (..) and (?<name>...) capturing groups in that construct to then later refer to them in the actual pattern:
```
(?(DEFINE)(?<dlld>\d\w\w\d))(?&dlld),(?&dlld)
          ^...............^ ^......^ ^......^
          |                    \       /          
 creates 'dlld' pattern      uses 'dlld' pattern twice
```
Just to be explicit: the standard library re module does not support subroutine patterns.
0 讨论(0)
发布评论:

提交评论
- 加载中...
南方客

2020-11-28 14:44
I was troubled with the same problem and wrote this snippet
```
import nre
my_regex=nre.from_string('''
a=\d\w\w\d
b={{a}},{{a}}
c=?P<id>{{a}}),(?P=id)
''')
my_regex["b"].match("1aw2,5cx7")
```
For lack of a more descriptive name, I named the partial regexes as a,b and c.

Accessing them is as easy as {{a}}
0 讨论(0)
发布评论:

提交评论
- 加载中...
天命终不由人

2020-11-28 14:56
Note: this will work with PyPi regex module, not with re module.

You could use the notation (?group-number), in your case:
```
(\d\w\w\d),(?1)
```
it is equivalent to:
```
(\d\w\w\d),(\d\w\w\d)
```
Be aware that \w includes \d. The regex will be:
```
(\d[a-zA-Z]{2}\d),(?1)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
隐瞒了意图╮

2020-11-28 15:04
Since you're already using re, why not use string processing to manage the pattern repetition as well:
```
pattern = "P,P".replace("P",r"\d\w\w\d")

re.match(pattern, "1aw2,5cx7")
```
OR
```
P = r"\d\w\w\d"

re.match(f"{P},{P}", "1aw2,5cx7")
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

情歌与酒

2020-11-28 15:07

import re
digit_letter_letter_digit = re.compile("\d\w\w\d") # we compile pattern so that we can reuse it later
all_finds = re.findall(digit_letter_letter_digit, "1aw2,5cx7") # finditer instead of findall
for value in all_finds:
    print(re.match(digit_letter_letter_digit, value))

0 讨论(0)