Python: re..find longest sequence

后端 未结 5 642
清歌不尽
清歌不尽 2021-01-06 07:28

I have a string that is randomly generated:

polymer_str = \"diol diNCO diamine diNCO diamine diNCO diamine diNCO diol diNCO diamine\"

I\'d

相关标签:
5条回答
  • 2021-01-06 07:57
    import re
    pat = re.compile("[^|]+")
    p = "diol diNCO diamine diNCO diamine diNCO diamine diNCO diol diNCO diamine".replace("diNCO diamine","|").replace(" ","")
    print max(map(len,pat.split(p)))
    
    0 讨论(0)
  • 2021-01-06 07:59

    One was is to use findall:

    polymer_str = "diol diNCO diamine diNCO diamine diNCO diamine diNCO diol diNCO diamine"
    len(re.findall("diNCO diamine", polymer_str)) # returns 4.
    
    0 讨论(0)
  • 2021-01-06 08:00

    Expanding on Ealdwulf's answer:

    Documentation on re.findall can be found here.

    def getLongestSequenceSize(search_str, polymer_str):
        matches = re.findall(r'(?:\b%s\b\s?)+' % search_str, polymer_str)
        longest_match = max(matches)
        return longest_match.count(search_str)
    

    This could be written as one line, but it becomes less readable in that form.

    Alternative:

    If polymer_str is huge, it will be more memory efficient to use re.finditer. Here's how you might go about it:

    def getLongestSequenceSize(search_str, polymer_str):
        longest_match = ''
        for match in re.finditer(r'(?:\b%s\b\s?)+' % search_str, polymer_str):
            if len(match.group(0)) > len(longest_match):
                longest_match = match.group(0)
        return longest_match.count(search_str)
    

    The biggest difference between findall and finditer is that the first returns a list object, while the second iterates over Match objects. Also, the finditer approach will be somewhat slower.

    0 讨论(0)
  • 2021-01-06 08:07

    Using re:

     m = re.search(r"(\bdiNCO diamine\b\s?)+", polymer_str)
     len(m.group(0)) / len("bdiNCO diamine")
    
    0 讨论(0)
  • 2021-01-06 08:10

    I think the op wants the longest contiguous sequence. You can get all contiguous sequences like: seqs = re.findall("(?:diNCO diamine)+", polymer_str)

    and then find the longest.

    0 讨论(0)
提交回复
热议问题