Given some random news article, I want to write a web crawler to find the largest body of text present, and extract it. The intention is to extract the physical news article on
You might look at the python-readability package which does exactly this for you.