I would like to scrape the table from html code using beautifulsoup. A snippet of the html is shown below. When using table.findAll(\'tr\')
I get the entire tab
As stated in their documentation html5lib
parses the document as the web browser does (Like lxml
in this case). It'll try to fix your document tree by adding/closing tags when needed.
In your example I've used lxml as the parser and it gave the following result:
soup = BeautifulSoup(data, "lxml")
table = soup.findAll("table")[0]
rows = table.find_all('tr')
for tr in rows:
print(tr.get_text(strip=True))
Note that lxml
added html & body tags because they weren't present in the source (It'll try to create a well formed document as previously state).