I am trying to insert an html string into a BeautifulSoup object. If I insert it directly, bs4 sanitizes the html. If take the html string and create a soup from it, and ins
The best way to do this is by creating a new tag span
and insert it into your mainSoup
. That is what the .new_tag method is for.
In [34]: from bs4 import BeautifulSoup
In [35]: mainSoup = BeautifulSoup("""
....: <html>
....: <div class='first'></div>
....: <div class='second'></div>
....: </html>
....: """)
In [36]: tag = mainSoup.new_tag('span')
In [37]: tag.attrs['class'] = 'first-content'
In [38]: mainSoup.insert(1, tag)
In [39]: print(mainSoup.find(class_='second'))
<div class="second"></div>
Simplest way, if you already have an html string, is to insert another BeautifulSoup object.
from bs4 import BeautifulSoup
doc = '''
soup = BeautifulSoup(doc, 'html.parser')
soup.div.append(BeautifulSoup('<div>insert1</div>', 'html.parser'))
print soup.prettify()
How about this? Idea is to use BeautifulSoup to generate the right AST node (span tag). Looks like this avoids the "None" problem.
import bs4
from bs4 import BeautifulSoup
mainSoup = BeautifulSoup("""
<div class='first'></div>
<div class='second'></div>
""", 'html.parser')
extraSoup = BeautifulSoup('<span class="first-content"></span>', 'html.parser')
tag = mainSoup.find(class_='first')
tag.insert(1, extraSoup.span)
print mainSoup.find(class_='second')
<div class="second"></div>