Using re to capture text between key words over the course of a doc

后端 未结 1 1103
悲哀的现实
悲哀的现实 2021-01-20 15:48

I am trying to capture text between key words in a document and the keys words themselves.

For example, let\'s say I have multiple instances of \"egg\" in a string.

相关标签:
1条回答
  • 2021-01-20 16:35

    You need to use a non-greedy match. The *? is a non-greedy form of *, and matches the smallest possible sequence. Also, /egg matches exactly that, but I assume you just want egg, so your actual regex becomes (egg) (.*?) (egg). However, since regular expressions consume the string as it is matched, you need to use look-ahead and look-behind assertions to match the intermediate text. In this case, (?<=egg) (.*?) (?=egg) finds text with "egg" before and after, but only returns the inbetween stuff, i.e. ['hashbrowns', 'bacon', 'fried milk']. Trying to match "egg" too would be quite a lot more complicated, and would probably involve parsing the string twice, so its only worth going into it if that's actually what you want.

    All this is documented in the python docs, so look there for more info.

    0 讨论(0)
提交回复
热议问题