Extracting text from HTML file using Python

后端 未结 30 2149
一生所求
一生所求 2020-11-22 04:05

I\'d like to extract the text from an HTML file using Python. I want essentially the same output I would get if I copied the text from a browser and pasted it into notepad.

30条回答
  •  自闭症患者
    2020-11-22 04:17

    in a simple way

    import re
    
    html_text = open('html_file.html').read()
    text_filtered = re.sub(r'<(.*?)>', '', html_text)
    

    this code finds all parts of the html_text started with '<' and ending with '>' and replace all found by an empty string

提交回复
热议问题