I am trying to insert an html string into a BeautifulSoup object. If I insert it directly, bs4 sanitizes the html. If take the html string and create a soup from it, and ins
The best way to do this is by creating a new tag span
and insert it into your mainSoup
. That is what the .new_tag method is for.
In [34]: from bs4 import BeautifulSoup
In [35]: mainSoup = BeautifulSoup("""
....: <html>
....: <div class='first'></div>
....: <div class='second'></div>
....: </html>
....: """)
In [36]: tag = mainSoup.new_tag('span')
In [37]: tag.attrs['class'] = 'first-content'
In [38]: mainSoup.insert(1, tag)
In [39]: print(mainSoup.find(class_='second'))
<div class="second"></div>
Simplest way, if you already have an html string, is to insert another BeautifulSoup object.
from bs4 import BeautifulSoup
doc = '''
<div>
test1
</div>
'''
soup = BeautifulSoup(doc, 'html.parser')
soup.div.append(BeautifulSoup('<div>insert1</div>', 'html.parser'))
print soup.prettify()
Output:
<div>
test1
<div>
insert1
</div>
</div>
How about this? Idea is to use BeautifulSoup to generate the right AST node (span tag). Looks like this avoids the "None" problem.
import bs4
from bs4 import BeautifulSoup
mainSoup = BeautifulSoup("""
<html>
<div class='first'></div>
<div class='second'></div>
</html>
""", 'html.parser')
extraSoup = BeautifulSoup('<span class="first-content"></span>', 'html.parser')
tag = mainSoup.find(class_='first')
tag.insert(1, extraSoup.span)
print mainSoup.find(class_='second')
Output:
<div class="second"></div>