I\'m dealing with single HTML strings like this
>> s = \'u>
\\n Some text
If I understand you right, you're looking to take this input:
u><br/>\n Some text <br/><br/><u
And receive this output:
\n Some text
This is done simply enough by only caring about what comes between the two inward-pointing brackets. We want:
>
(so we know where to begin)\n Some text
(the content) which does not contain a left-bracket<
(so we know where to end)You want:
>>> s = 'u><br/>\n Some text <br/><br/><u'
>>> re.search(r'>([^<]+)<', s)
<_sre.SRE_Match object; span=(6, 55), match='>\n Some text >
(The captured group can be accessed via .group(1)
.)
Additionally, you may want to use re.findall
if you expect there to be multiple matches per line:
>>> re.findall(r'>([^<]+)<', s)
['\n Some text ']
EDIT: To address the comment: If you have multiple matches and you want to connect them into a single string (effectively removing all HTML-like tag things), do:
>>> s = 'nbsp;<br><br>Some text.<br>Some \n more text.<br'
>>> ' '.join(re.findall(r'>([^<]+)<', s))
'Some text. Some \n more text.'