Python Beautiful Soup .content Property

前端 未结 1 1106
终归单人心
终归单人心 2021-01-13 04:13

What does BeautifulSoup\'s .content do? I am working through crummy.com\'s tutorial and I don\'t really understand what .content does. I have looked at the forums and I have

1条回答
  •  北海茫月
    2021-01-13 05:00

    It just gives you whats inside the tag. Let me demonstrate with an example:

    html_doc = """
    The Dormouse's story
    
    

    The Dormouse's story

    Once upon a time there were three little sisters; and their names were Elsie, Lacie and Tillie; and they lived at the bottom of a well.

    ...

    """ from bs4 import BeautifulSoup soup = BeautifulSoup(html_doc) head = soup.head print head.contents

    The above code gives me a list,[The Dormouse's story], because thats inside the head tag. So calling [0] would give you the first item in the list.

    The reason you get an error is because soup.contents[0].contents[0].contents[0].contents[0] returns something with no further tags (therefore no attributes). It returns Page Title from your code, because the first contents[0] gives you the HTML tag, the second one, gives you the head tag. The third one leads to the title tag, and the fourth one gives you the actual content. So, when you call a name on it, it has no tags to give you.

    If you want the body printed, you can do the following:

    soup = BeautifulSoup(''.join(doc))
    print soup.body
    

    If you want body using contents only, then use the following:

    soup = BeautifulSoup(''.join(doc))
    print soup.contents[0].contents[1].name
    

    You will not get it using [0] as the index, because body is the second element after head.

    0 讨论(0)
提交回复
热议问题