BeautifulSoup Grab Visible Webpage Text

前端未结

关注

 10  627

北恋 2020-11-22 07:35

Basically, I want to use BeautifulSoup to grab strictly the visible text on a webpage. For instance, this webpage is my test case. And I mainly want to just get the

10条回答

孤独总比滥情好 (楼主)

2020-11-22 07:53
The title is inside an tag, which is nested inside an
tag and a
tag with id "article".
```
soup.findAll('nyt_headline', limit=1)
```
Should work.

The article body is inside an tag, which is nested inside a
tag with id "articleBody". Inside the element, the text itself is contained within
tags. Images are not within those
tags. It's difficult for me to experiment with the syntax, but I expect a working scrape to look something like this.

text = soup.findAll('nyt_text', limit=1)[0] text.findAll('p')
0 讨论(0)

查看其它10个回答
发布评论:

提交评论

加载中...