Strip HTML tags to get strings in python

后端 未结 3 1038
太阳男子
太阳男子 2021-01-21 12:24

I tried to get some strings from an HTML file with BeautifulSoup and everytime I work with it I get partial results.

I want to get the strings in every li element/tag. S

相关标签:
3条回答
  • 2021-01-21 13:02

    This example from the documentation gives a very nice one liner.

    ''.join(BeautifulSoup(source).findAll(text=True))
    
    0 讨论(0)
  • 2021-01-21 13:14

    Iterate over results and get the value of text attribute:

    for element in soup.select(".sidebar li"):
        print element.text
    

    Example:

    from bs4 import BeautifulSoup
    
    
    data = """
    <body>
        <ul>
            <li class="first">Def Leppard -  Make Love Like A Man<span>Live</span> </li>
            <li>Inxs - Never Tear Us Apart        </li>
        </ul>
    </body>
    """
    
    soup = BeautifulSoup(data)
    for element in soup.select('li'):
        print element.text
    

    prints:

    Def Leppard -  Make Love Like A ManLive 
    Inxs - Never Tear Us Apart        
    
    0 讨论(0)
  • 2021-01-21 13:20

    Use beautiful soups - .strings method.

    for string in soup.stripped_strings:
    print(repr(string))
    

    from the docs:

    If there’s more than one thing inside a tag, you can still look at just the strings. Use the .strings generator:

    or

    These strings tend to have a lot of extra whitespace, which you can remove by using the .stripped_strings generator instead:

    0 讨论(0)
提交回复
热议问题