发表新帖

发表新帖

Using re to capture text between key words over the course of a doc

后端未结

关注

 1  1103

悲哀的现实

I am trying to capture text between key words in a document and the keys words themselves.

For example, let\'s say I have multiple instances of \"egg\" in a string.

相关标签:

1条回答

北恋

2021-01-20 16:35

You need to use a non-greedy match. The *? is a non-greedy form of *, and matches the smallest possible sequence. Also, /egg matches exactly that, but I assume you just want egg, so your actual regex becomes (egg) (.*?) (egg). However, since regular expressions consume the string as it is matched, you need to use look-ahead and look-behind assertions to match the intermediate text. In this case, (?<=egg) (.*?) (?=egg) finds text with "egg" before and after, but only returns the inbetween stuff, i.e. ['hashbrowns', 'bacon', 'fried milk']. Trying to match "egg" too would be quite a lot more complicated, and would probably involve parsing the string twice, so its only worth going into it if that's actually what you want.

All this is documented in the python docs, so look there for more info.

0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题