I\'m looking for an HTML Parser module for Python that can help me get the tags in the form of Python lists/dictionaries/objects.
If I have a document of the form:>
I recommend using justext library:
https://github.com/miso-belica/jusText
Usage: Python2:
import requests
import justext
response = requests.get("http://planet.python.org/")
paragraphs = justext.justext(response.content, justext.get_stoplist("English"))
for paragraph in paragraphs:
print paragraph.text
Python3:
import requests
import justext
response = requests.get("http://bbc.com/")
paragraphs = justext.justext(response.content, justext.get_stoplist("English"))
for paragraph in paragraphs:
print (paragraph.text)