I have a collection of HTML pages from different websites with various templates and I want to get the clean main body text of these pages. Since there are too many pages an