I tried to get some strings from an HTML file with BeautifulSoup and everytime I work with it I get partial results.
I want to get the strings in every li element/tag. S
This example from the documentation gives a very nice one liner.
''.join(BeautifulSoup(source).findAll(text=True))
Iterate over results and get the value of text
attribute:
for element in soup.select(".sidebar li"):
print element.text
Example:
from bs4 import BeautifulSoup
data = """
<body>
<ul>
<li class="first">Def Leppard - Make Love Like A Man<span>Live</span> </li>
<li>Inxs - Never Tear Us Apart </li>
</ul>
</body>
"""
soup = BeautifulSoup(data)
for element in soup.select('li'):
print element.text
prints:
Def Leppard - Make Love Like A ManLive
Inxs - Never Tear Us Apart
Use beautiful soups - .strings method.
for string in soup.stripped_strings:
print(repr(string))
from the docs:
If there’s more than one thing inside a tag, you can still look at just the strings. Use the .strings generator:
or
These strings tend to have a lot of extra whitespace, which you can remove by using the .stripped_strings generator instead: