Strip HTML tags to get strings in python

后端未结

关注

 3  1038

太阳男子

I tried to get some strings from an HTML file with BeautifulSoup and everytime I work with it I get partial results.

I want to get the strings in every li element/tag. S

相关标签:

3条回答

孤城傲影

2021-01-21 13:02
This example from the documentation gives a very nice one liner.
```
''.join(BeautifulSoup(source).findAll(text=True))
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

甜味超标

2021-01-21 13:14

Iterate over results and get the value of text attribute:

for element in soup.select(".sidebar li"):
    print element.text

Example:

from bs4 import BeautifulSoup


data = """
<body>
    <ul>
        <li class="first">Def Leppard -  Make Love Like A Man<span>Live</span> </li>
        <li>Inxs - Never Tear Us Apart        </li>
    </ul>
</body>
"""

soup = BeautifulSoup(data)
for element in soup.select('li'):
    print element.text

prints:

Def Leppard -  Make Love Like A ManLive 
Inxs - Never Tear Us Apart

0 讨论(0)

失恋的感觉

2021-01-21 13:20
Use beautiful soups - .strings method.
```
for string in soup.stripped_strings:
print(repr(string))
```
from the docs:

If there’s more than one thing inside a tag, you can still look at just the strings. Use the .strings generator:

or

These strings tend to have a lot of extra whitespace, which you can remove by using the .stripped_strings generator instead:
0 讨论(0)
发布评论:

提交评论
- 加载中...