Parsing HTML using Python

前端 未结 7 644
爱一瞬间的悲伤
爱一瞬间的悲伤 2020-11-22 00:35

I\'m looking for an HTML Parser module for Python that can help me get the tags in the form of Python lists/dictionaries/objects.

If I have a document of the form:

7条回答
  •  遇见更好的自我
    2020-11-22 01:18

    I recommend using justext library:

    https://github.com/miso-belica/jusText

    Usage: Python2:

    import requests
    import justext
    
    response = requests.get("http://planet.python.org/")
    paragraphs = justext.justext(response.content, justext.get_stoplist("English"))
    for paragraph in paragraphs:
        print paragraph.text
    

    Python3:

    import requests
    import justext
    
    response = requests.get("http://bbc.com/")
    paragraphs = justext.justext(response.content, justext.get_stoplist("English"))
    for paragraph in paragraphs:
        print (paragraph.text)
    

提交回复
热议问题