I am trying to capture text between key words in a document and the keys words themselves.
For example, let\'s say I have multiple instances of \"egg\" in a string.
You need to use a non-greedy match. The *?
is a non-greedy form of *
, and matches the smallest possible sequence. Also, /egg
matches exactly that, but I assume you just want egg
, so your actual regex becomes (egg) (.*?) (egg)
. However, since regular expressions consume the string as it is matched, you need to use look-ahead and look-behind assertions to match the intermediate text. In this case, (?<=egg) (.*?) (?=egg)
finds text with "egg" before and after, but only returns the inbetween stuff, i.e. ['hashbrowns', 'bacon', 'fried milk']
. Trying to match "egg" too would be quite a lot more complicated, and would probably involve parsing the string twice, so its only worth going into it if that's actually what you want.
All this is documented in the python docs, so look there for more info.