parsing HTML table using python - HTMLparser or lxml

后端 未结 2 866
我寻月下人不归
我寻月下人不归 2021-02-14 04:09

I have a html page which consist of a table & I want to fetch all the values in td, tr in that table.
I have tried working with beautifulsoup but now i wanted to work on

2条回答
  •  故里飘歌
    2021-02-14 04:42

    I can't add comments but it might be helpful for someone else:

    I had some bold and italic text within the tables cells so c.text returned None. I used c.text_content() instead like:

    >>> from lxml.html import parse
    >>> page = parse("test.html")
    >>> rows = page.xpath("body/table")[0].findall("tr")
    >>> data = list()
    >>> for row in rows:
    ...     data.append([c.text_content() for c in row.getchildren()])
    ... 
    

提交回复
热议问题