Python: re..find longest sequence

后端未结

关注

 5  640

I have a string that is randomly generated:

polymer_str = \"diol diNCO diamine diNCO diamine diNCO diamine diNCO diol diNCO diamine\"

I\'d

相关标签:

5条回答

花落未央

2021-01-06 07:57

import re
pat = re.compile("[^|]+")
p = "diol diNCO diamine diNCO diamine diNCO diamine diNCO diol diNCO diamine".replace("diNCO diamine","|").replace(" ","")
print max(map(len,pat.split(p)))

0 讨论(0)

悲哀的现实

2021-01-06 07:59

One was is to use findall:

polymer_str = "diol diNCO diamine diNCO diamine diNCO diamine diNCO diol diNCO diamine"
len(re.findall("diNCO diamine", polymer_str)) # returns 4.

0 讨论(0)

傲寒

2021-01-06 08:00
Expanding on Ealdwulf's answer:

Documentation on re.findall can be found here.
```
def getLongestSequenceSize(search_str, polymer_str):
    matches = re.findall(r'(?:\b%s\b\s?)+' % search_str, polymer_str)
    longest_match = max(matches)
    return longest_match.count(search_str)
```
This could be written as one line, but it becomes less readable in that form.

Alternative:

If polymer_str is huge, it will be more memory efficient to use re.finditer. Here's how you might go about it:
```
def getLongestSequenceSize(search_str, polymer_str):
    longest_match = ''
    for match in re.finditer(r'(?:\b%s\b\s?)+' % search_str, polymer_str):
        if len(match.group(0)) > len(longest_match):
            longest_match = match.group(0)
    return longest_match.count(search_str)
```
The biggest difference between findall and finditer is that the first returns a list object, while the second iterates over Match objects. Also, the finditer approach will be somewhat slower.
0 讨论(0)
发布评论:

提交评论
- 加载中...

囚心锁ツ

2021-01-06 08:07

Using re:

 m = re.search(r"(\bdiNCO diamine\b\s?)+", polymer_str)
 len(m.group(0)) / len("bdiNCO diamine")

0 讨论(0)

太阳男子

2021-01-06 08:10

I think the op wants the longest contiguous sequence. You can get all contiguous sequences like: seqs = re.findall("(?:diNCO diamine)+", polymer_str)

and then find the longest.

0 讨论(0)
发布评论:

提交评论
- 加载中...