Instead of parsing the html by yourself, take a look at this built-in python html parser (or this for python 2).
It will probably be easier and more robust than any code you will write by your own.
The example from the python documentation:
from html.parser import HTMLParser
class MyHTMLParser(HTMLParser):
def handle_starttag(self, tag, attrs):
print("Encountered a start tag:", tag)
def handle_endtag(self, tag):
print("Encountered an end tag :", tag)
def handle_data(self, data):
print("Encountered some data :", data)
parser = MyHTMLParser()
parser.feed('<html><head><title>Test</title></head>'
'<body><h1>Parse me!</h1></body></html>')
To use this example just add a member to the class which keeps track of the content
you have.