Capturing named groups in regex with re.findall

后端未结

关注

 3  1250

When I was trying to answer this question: regex to split %ages and values in python I noticed that I had to re-order the groups from the result of findall. For example:

相关标签:

3条回答

无人及你

2021-01-02 20:32
As you've identified in your second example, re.findall returns the groups in the original order.

The problem is that the standard Python dict type does not preserve the order of keys in any way. Here's the manual for Python 2.x, which makes it explicit, but it's still true in Python 3.x: https://docs.python.org/2/library/stdtypes.html#dict.items

What you should use instead is collections.OrderedDict:
```
from collections import OrderedDict as odict

data = """34% passed 23% failed 46% deferred"""
result = odict((key,value) for value, key in re.findall('(\w+)%\s(\w+)', data))
print(result)
>>> OrderedDict([('passed', '34'), ('failed', '23'), ('deferred', '46')])
```
Notice that you must use the pairwise constructor form (dict((k,v) for k,v in ...) rather than the dict comprehension constructor ({k:v for k,v in ...}). That's because the latter constructs instances of dicttype, which cannot be converted to OrderedDict without losing the order of the keys... which is of course what you are trying to preserve in the first place.
0 讨论(0)
发布评论:

提交评论
- 加载中...
傲寒

2021-01-02 20:35
Take 3, based on a further clarification of the OP's intent in this comment.

Ashwin is correct that findall does not preserve named capture groups (e.g. (?P<name>regex)). finditer to the rescue! It returns the individual match objects one-by-one. Simple example:
```
data = """34% passed 23% failed 46% deferred"""
for m in re.finditer('(?P<percentage>\w+)%\s(?P<word>\w+)', data):
    print( m.group('percentage'), m.group('word') )
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
刺人心

2021-01-02 20:38
Per the OP's comment on my first answer: If you are simply trying to reorder a list of 2-tuples like this:
```
[('34', 'passed'), ('23', 'failed'), ('46', 'deferred')]
```
... to look like this, with individual elements reversed:
```
[('passed', '34'), ('failed', '23'), ('deferred', '46')]
```
There's an easy solution: use a list comprehension with the slicing syntax sequence[::-1] to reverse the order of the elements of the individual tuples:
```
a = [('34', 'passed'), ('23', 'failed'), ('46', 'deferred')]
b = [x[::-1] for x in a]
print b
```
0 讨论(0)
发布评论:

提交评论
- 加载中...