I\'d like to extract the text from an HTML file using Python. I want essentially the same output I would get if I copied the text from a browser and pasted it into notepad.
Best worked for me is inscripts .
https://github.com/weblyzard/inscriptis
import urllib.request from inscriptis import get_text url = "http://www.informationscience.ch" html = urllib.request.urlopen(url).read().decode('utf-8') text = get_text(html) print(text)
The results are really good