Insert html string into BeautifulSoup object

后端未结

关注

 2  2096

I am trying to insert an html string into a BeautifulSoup object. If I insert it directly, bs4 sanitizes the html. If take the html string and create a soup from it, and ins

相关标签:

2条回答

终归单人心

2021-01-05 02:36

The best way to do this is by creating a new tag span and insert it into your mainSoup. That is what the .new_tag method is for.

In [34]: from bs4 import BeautifulSoup

In [35]: mainSoup = BeautifulSoup("""
   ....: <html>
   ....:     <div class='first'></div>
   ....:     <div class='second'></div>
   ....: </html>
   ....: """)

In [36]: tag = mainSoup.new_tag('span')

In [37]: tag.attrs['class'] = 'first-content'

In [38]: mainSoup.insert(1, tag)

In [39]: print(mainSoup.find(class_='second'))
<div class="second"></div>

0 讨论(0)

情歌与酒

2021-01-05 02:55

Simplest way, if you already have an html string, is to insert another BeautifulSoup object.

from bs4 import BeautifulSoup

doc = '''
<div>
 test1
</div>
'''

soup = BeautifulSoup(doc, 'html.parser')

soup.div.append(BeautifulSoup('<div>insert1</div>', 'html.parser'))

print soup.prettify()

Output:

<div>
 test1
<div>
 insert1
</div>
</div>

Update 1

How about this? Idea is to use BeautifulSoup to generate the right AST node (span tag). Looks like this avoids the "None" problem.

import bs4
from bs4 import BeautifulSoup

mainSoup = BeautifulSoup("""
<html>
    <div class='first'></div>
    <div class='second'></div>
</html>
""", 'html.parser')

extraSoup = BeautifulSoup('<span class="first-content"></span>', 'html.parser')
tag = mainSoup.find(class_='first')
tag.insert(1, extraSoup.span)

print mainSoup.find(class_='second')

Output:

<div class="second"></div>

0 讨论(0)