How to obtain the content between a tag and it's ending in HTML using python' beautiful soup?

后端 未结 2 958
广开言路
广开言路 2021-01-24 02:07

I have a HTML line as follows:

Is this model too thin for Yves Saint Laurent? 

I would lik

2条回答
  •  野的像风
    2021-01-24 02:29

    If your element contains only text, use the .string attribute:

    headline = soup.find(class_='cd__headline-text')
    print(headline.string)
    

    If there are other tags contained, you can either get all the text contained in the current element and further, or only get specific text from the current element.

    The element.get_text() function will recurse and gather all strings in element and child elements, concatenating them with your string of choice (defaulting to the empty string) and with or without whitespace stripping.

    To get only specific strings, you can either iterate over the .strings or .stripped_strings generators, or use the element contents to access all contained elements, then pick out instances of the NavigableString type.

    Demo with your sample:

    >>> from bs4 import BeautifulSoup
    >>> markup = 'Is this model too thin for Yves Saint Laurent? '
    >>> soup = BeautifulSoup(markup)
    >>> headline = soup.find(class_='cd__headline-text')
    >>> print headline.string
    Is this model too thin for Yves Saint Laurent? 
    >>> print list(headline.strings)
    [u'Is this model too thin for Yves Saint Laurent? ']
    >>> print list(headline.stripped_strings)
    [u'Is this model too thin for Yves Saint Laurent?']
    >>> print headline.get_text()
    Is this model too thin for Yves Saint Laurent? 
    >>> print headline.get_text(strip=True)
    Is this model too thin for Yves Saint Laurent?
    

    and with an additional element added:

    >>> markup = 'Is this model too thin for Yves Saint Laurent? '
    >>> soup = BeautifulSoup(markup)
    >>> headline = soup.find(class_='cd__headline-text')
    >>> headline.string is None
    True
    >>> print list(headline.strings)
    [u'Is this model ', u'too thin', u' for Yves Saint Laurent? ']
    >>> print list(headline.stripped_strings)
    [u'Is this model', u'too thin', u'for Yves Saint Laurent?']
    >>> print headline.get_text()
    Is this model too thin for Yves Saint Laurent? 
    >>> print headline.get_text(' - ', strip=True)
    Is this model - too thin - for Yves Saint Laurent?
    >>> headline.contents
    [u'Is this model ', too thin, u' for Yves Saint Laurent? ']
    >>> from bs4 import NavigableString
    >>> [el for el in headline.children if isinstance(el, NavigableString)]
    [u'Is this model ', u' for Yves Saint Laurent? ']
    

提交回复
热议问题