How to extract the substring between two markers?

前端 未结 18 2354
慢半拍i
慢半拍i 2020-11-22 06:02

Let\'s say I have a string \'gfgfdAAA1234ZZZuijjk\' and I want to extract just the \'1234\' part.

I only know what will be the few characte

相关标签:
18条回答
  • 2020-11-22 06:19

    Surprised that nobody has mentioned this which is my quick version for one-off scripts:

    >>> x = 'gfgfdAAA1234ZZZuijjk'
    >>> x.split('AAA')[1].split('ZZZ')[0]
    '1234'
    
    0 讨论(0)
  • 2020-11-22 06:20

    With sed it is possible to do something like this with a string:

    echo "$STRING" | sed -e "s|.*AAA\(.*\)ZZZ.*|\1|"

    And this will give me 1234 as a result.

    You could do the same with re.sub function using the same regex.

    >>> re.sub(r'.*AAA(.*)ZZZ.*', r'\1', 'gfgfdAAA1234ZZZuijjk')
    '1234'
    

    In basic sed, capturing group are represented by \(..\), but in python it was represented by (..).

    0 讨论(0)
  • 2020-11-22 06:24
    >>> s = '/tmp/10508.constantstring'
    >>> s.split('/tmp/')[1].split('constantstring')[0].strip('.')
    
    0 讨论(0)
  • 2020-11-22 06:25

    Here's a solution without regex that also accounts for scenarios where the first substring contains the second substring. This function will only find a substring if the second marker is after the first marker.

    def find_substring(string, start, end):
        len_until_end_of_first_match = string.find(start) + len(start)
        after_start = string[len_until_end_of_first_match:]
        return string[string.find(start) + len(start):len_until_end_of_first_match + after_start.find(end)]
    
    0 讨论(0)
  • 2020-11-22 06:31

    Just in case somebody will have to do the same thing that I did. I had to extract everything inside parenthesis in a line. For example, if I have a line like 'US president (Barack Obama) met with ...' and I want to get only 'Barack Obama' this is solution:

    regex = '.*\((.*?)\).*'
    matches = re.search(regex, line)
    line = matches.group(1) + '\n'
    

    I.e. you need to block parenthesis with slash \ sign. Though it is a problem about more regular expressions that Python.

    Also, in some cases you may see 'r' symbols before regex definition. If there is no r prefix, you need to use escape characters like in C. Here is more discussion on that.

    0 讨论(0)
  • 2020-11-22 06:31

    One liners that return other string if there was no match. Edit: improved version uses next function, replace "not-found" with something else if needed:

    import re
    res = next( (m.group(1) for m in [re.search("AAA(.*?)ZZZ", "gfgfdAAA1234ZZZuijjk" ),] if m), "not-found" )
    

    My other method to do this, less optimal, uses regex 2nd time, still didn't found a shorter way:

    import re
    res = ( ( re.search("AAA(.*?)ZZZ", "gfgfdAAA1234ZZZuijjk") or re.search("()","") ).group(1) )
    
    0 讨论(0)
提交回复
热议问题