I have a HTML line as follows:
Is this model too thin for Yves Saint Laurent?
I would lik
If your element contains only text, use the .string attribute:
headline = soup.find(class_='cd__headline-text')
print(headline.string)
If there are other tags contained, you can either get all the text contained in the current element and further, or only get specific text from the current element.
The element.get_text() function will recurse and gather all strings in element and child elements, concatenating them with your string of choice (defaulting to the empty string) and with or without whitespace stripping.
To get only specific strings, you can either iterate over the .strings or .stripped_strings generators, or use the element contents to access all contained elements, then pick out instances of the NavigableString
type.
Demo with your sample:
>>> from bs4 import BeautifulSoup
>>> markup = 'Is this model too thin for Yves Saint Laurent? '
>>> soup = BeautifulSoup(markup)
>>> headline = soup.find(class_='cd__headline-text')
>>> print headline.string
Is this model too thin for Yves Saint Laurent?
>>> print list(headline.strings)
[u'Is this model too thin for Yves Saint Laurent? ']
>>> print list(headline.stripped_strings)
[u'Is this model too thin for Yves Saint Laurent?']
>>> print headline.get_text()
Is this model too thin for Yves Saint Laurent?
>>> print headline.get_text(strip=True)
Is this model too thin for Yves Saint Laurent?
and with an additional element added:
>>> markup = 'Is this model too thin for Yves Saint Laurent? '
>>> soup = BeautifulSoup(markup)
>>> headline = soup.find(class_='cd__headline-text')
>>> headline.string is None
True
>>> print list(headline.strings)
[u'Is this model ', u'too thin', u' for Yves Saint Laurent? ']
>>> print list(headline.stripped_strings)
[u'Is this model', u'too thin', u'for Yves Saint Laurent?']
>>> print headline.get_text()
Is this model too thin for Yves Saint Laurent?
>>> print headline.get_text(' - ', strip=True)
Is this model - too thin - for Yves Saint Laurent?
>>> headline.contents
[u'Is this model ', too thin, u' for Yves Saint Laurent? ']
>>> from bs4 import NavigableString
>>> [el for el in headline.children if isinstance(el, NavigableString)]
[u'Is this model ', u' for Yves Saint Laurent? ']