Parsing HTML using Python

前端 未结 7 652
爱一瞬间的悲伤
爱一瞬间的悲伤 2020-11-22 00:35

I\'m looking for an HTML Parser module for Python that can help me get the tags in the form of Python lists/dictionaries/objects.

If I have a document of the form:

7条回答
  •  一整个雨季
    2020-11-22 01:22

    So that I can ask it to get me the content/text in the div tag with class='container' contained within the body tag, Or something similar.

    try: 
        from BeautifulSoup import BeautifulSoup
    except ImportError:
        from bs4 import BeautifulSoup
    html = #the HTML code you've written above
    parsed_html = BeautifulSoup(html)
    print(parsed_html.body.find('div', attrs={'class':'container'}).text)
    

    You don't need performance descriptions I guess - just read how BeautifulSoup works. Look at its official documentation.

提交回复
热议问题