I\'m searching a file line by line for the occurrence of ##random_string##. It works except for the case of multiple #...
pattern=\'##(.*?)##\'
prog=re.compile(p
Try the "block comment trick": /##((?:[^#]|#[^#])+?)##/
Your problem is with your inner match. You use .
, which matches any character that isn't a line end, and that means it matches #
as well. So when it gets ###hey##
, it matches (.*?)
to #hey
.
The easy solution is to exclude the #
character from the matchable set:
prog = re.compile(r'##([^#]*)##')
Protip: Use raw strings (e.g. r''
) for regular expressions so you don't have to go crazy with backslash escapes.
Trying to allow #
inside the hashes will make things much more complicated.
EDIT: If you do not want to allow blank inner text (i.e. "####" shouldn't match with an inner text of ""), then change it to:
prog = re.compile(r'##([^#]+)##')
+
means "one or more."
>>> import re
>>> text= 'lala ###hey## there'
>>> matcher= re.compile(r"##[^#]+##")
>>> print matcher.sub("FOUND", text)
lala #FOUND there
>>>
To match at least two hashes at either end:
pattern='##+(.*?)##+'
have you considered doing it non-regex way?
>>> string='lala ####hey## there'
>>> string.split("####")[1].split("#")[0]
'hey'
'^#{2,}([^#]*)#{2,}'
-- any number of # >= 2 on either end
be careful with using lazy quantifiers like (.*?) because it'd match '##abc#####' and capture 'abc###'. also lazy quantifiers are very slow