Basically, I want to use BeautifulSoup to grab strictly the visible text on a webpage. For instance, this webpage is my test case. And I mainly want to just get the
The title is inside an Should work. The article body is inside an
tag, which is nested inside an tag and a
soup.findAll('nyt_headline', limit=1)
tag, which is nested inside a
element, the text itself is contained within tags. Images are not within those
tags. It's difficult for me to experiment with the syntax, but I expect a working scrape to look something like this.
text = soup.findAll('nyt_text', limit=1)[0]
text.findAll('p')