Extracting text from HTML file using Python

后端 未结 30 2096
一生所求
一生所求 2020-11-22 04:05

I\'d like to extract the text from an HTML file using Python. I want essentially the same output I would get if I copied the text from a browser and pasted it into notepad.

30条回答
  •  盖世英雄少女心
    2020-11-22 04:30

    install html2text using

    pip install html2text

    then,

    >>> import html2text
    >>>
    >>> h = html2text.HTML2Text()
    >>> # Ignore converting links from HTML
    >>> h.ignore_links = True
    >>> print h.handle("

    Hello, world!") Hello, world!

提交回复
热议问题