Insert html string into BeautifulSoup object

后端 未结 2 2095
暖寄归人
暖寄归人 2021-01-05 02:29

I am trying to insert an html string into a BeautifulSoup object. If I insert it directly, bs4 sanitizes the html. If take the html string and create a soup from it, and ins

相关标签:
2条回答
  • 2021-01-05 02:36

    The best way to do this is by creating a new tag span and insert it into your mainSoup. That is what the .new_tag method is for.

    In [34]: from bs4 import BeautifulSoup
    
    In [35]: mainSoup = BeautifulSoup("""
       ....: <html>
       ....:     <div class='first'></div>
       ....:     <div class='second'></div>
       ....: </html>
       ....: """)
    
    In [36]: tag = mainSoup.new_tag('span')
    
    In [37]: tag.attrs['class'] = 'first-content'
    
    In [38]: mainSoup.insert(1, tag)
    
    In [39]: print(mainSoup.find(class_='second'))
    <div class="second"></div>
    
    0 讨论(0)
  • 2021-01-05 02:55

    Simplest way, if you already have an html string, is to insert another BeautifulSoup object.

    from bs4 import BeautifulSoup
    
    doc = '''
    <div>
     test1
    </div>
    '''
    
    soup = BeautifulSoup(doc, 'html.parser')
    
    soup.div.append(BeautifulSoup('<div>insert1</div>', 'html.parser'))
    
    print soup.prettify()
    

    Output:

    <div>
     test1
    <div>
     insert1
    </div>
    </div>
    

    Update 1

    How about this? Idea is to use BeautifulSoup to generate the right AST node (span tag). Looks like this avoids the "None" problem.

    import bs4
    from bs4 import BeautifulSoup
    
    mainSoup = BeautifulSoup("""
    <html>
        <div class='first'></div>
        <div class='second'></div>
    </html>
    """, 'html.parser')
    
    extraSoup = BeautifulSoup('<span class="first-content"></span>', 'html.parser')
    tag = mainSoup.find(class_='first')
    tag.insert(1, extraSoup.span)
    
    print mainSoup.find(class_='second')
    

    Output:

    <div class="second"></div>
    
    0 讨论(0)
提交回复
热议问题