Extracting text from HTML file using Python

后端未结

关注

 30  2128

一生所求 2020-11-22 04:05

I\'d like to extract the text from an HTML file using Python. I want essentially the same output I would get if I copied the text from a browser and pasted it into notepad.

30条回答

时光说笑 (楼主)

2020-11-22 04:21
I know there are a lot of answers already, but the most elegent and pythonic solution I have found is described, in part, here.
```
from bs4 import BeautifulSoup

text = ''.join(BeautifulSoup(some_html_string, "html.parser").findAll(text=True))
```
Update

Based on Fraser's comment, here is more elegant solution:
```
from bs4 import BeautifulSoup

clean_text = ''.join(BeautifulSoup(some_html_string, "html.parser").stripped_strings)
```
0 讨论(0)

查看其它30个回答
发布评论:

提交评论
- 加载中...

Extracting text from HTML file using Python

Update