How to extract the substring between two markers?

前端未结

关注

 18  2354

慢半拍i

Let\'s say I have a string \'gfgfdAAA1234ZZZuijjk\' and I want to extract just the \'1234\' part.

I only know what will be the few characte

相关标签:

18条回答

抹茶落季

2020-11-22 06:19
Surprised that nobody has mentioned this which is my quick version for one-off scripts:
```
>>> x = 'gfgfdAAA1234ZZZuijjk'
>>> x.split('AAA')[1].split('ZZZ')[0]
'1234'
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
时光说笑

2020-11-22 06:20
With sed it is possible to do something like this with a string:

echo "$STRING" | sed -e "s|.*AAA$.*$ZZZ.*|\1|"

And this will give me 1234 as a result.

You could do the same with re.sub function using the same regex.
```
>>> re.sub(r'.*AAA(.*)ZZZ.*', r'\1', 'gfgfdAAA1234ZZZuijjk')
'1234'
```
In basic sed, capturing group are represented by $..$, but in python it was represented by (..).
0 讨论(0)
发布评论:

提交评论
- 加载中...

野性不改

2020-11-22 06:24

>>> s = '/tmp/10508.constantstring'
>>> s.split('/tmp/')[1].split('constantstring')[0].strip('.')

0 讨论(0)

太阳男子

2020-11-22 06:25
Here's a solution without regex that also accounts for scenarios where the first substring contains the second substring. This function will only find a substring if the second marker is after the first marker.
```
def find_substring(string, start, end):
    len_until_end_of_first_match = string.find(start) + len(start)
    after_start = string[len_until_end_of_first_match:]
    return string[string.find(start) + len(start):len_until_end_of_first_match + after_start.find(end)]
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
悲哀的现实

2020-11-22 06:31
Just in case somebody will have to do the same thing that I did. I had to extract everything inside parenthesis in a line. For example, if I have a line like 'US president (Barack Obama) met with ...' and I want to get only 'Barack Obama' this is solution:
```
regex = '.*$(.*?)$.*'
matches = re.search(regex, line)
line = matches.group(1) + '\n'
```
I.e. you need to block parenthesis with slash \ sign. Though it is a problem about more regular expressions that Python.

Also, in some cases you may see 'r' symbols before regex definition. If there is no r prefix, you need to use escape characters like in C. Here is more discussion on that.
0 讨论(0)
发布评论:

提交评论
- 加载中...
旧巷少年郎

2020-11-22 06:31
One liners that return other string if there was no match. Edit: improved version uses next function, replace "not-found" with something else if needed:
```
import re
res = next( (m.group(1) for m in [re.search("AAA(.*?)ZZZ", "gfgfdAAA1234ZZZuijjk" ),] if m), "not-found" )
```
My other method to do this, less optimal, uses regex 2nd time, still didn't found a shorter way:
```
import re
res = ( ( re.search("AAA(.*?)ZZZ", "gfgfdAAA1234ZZZuijjk") or re.search("()","") ).group(1) )
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2 3