Python - beautifulsoup - how to deal with missing closing tags

前端未结

关注

 1  1429

深忆病人 2021-01-15 13:25

I would like to scrape the table from html code using beautifulsoup. A snippet of the html is shown below. When using table.findAll(\'tr\') I get the entire tab

1条回答

伪装坚强ぢ (楼主)

2021-01-15 13:51
As stated in their documentation html5lib parses the document as the web browser does (Like lxmlin this case). It'll try to fix your document tree by adding/closing tags when needed.

In your example I've used lxml as the parser and it gave the following result:
```
soup = BeautifulSoup(data, "lxml")
table = soup.findAll("table")[0]
rows = table.find_all('tr')
for tr in rows:
    print(tr.get_text(strip=True))
```
Note that lxml added html & body tags because they weren't present in the source (It'll try to create a well formed document as previously state).
0 讨论(0)
发布评论:

提交评论
- 加载中...