问题
BeautifulSoup is not extracting the div I want properly. I am not sure what I am doing wrong. Here is the html:
<div id='display'>
<div class='result'>
<div>text0 </p></div>
<div>text1</div>
<div>text2</div>
</div>
</div>
And here is my code:
div = soup.find("div", {"class": "result"})
print(div)
I am seeing this:
<div class="result">
<div>text0 </div></div>
What I am expecting is this:
<div class="result">
<div>text0</div>
<div>text1</div>
<div>text2</div>
</div>
This works as expected if I remove the </p>
tag. In other words, the </p>
tag seems to be throwing the parser off.
Edit:
This works as expected on Python 2.7.12, beautifulsoup4 version 4.5.1. But does not work on Python 3.6.4, beautifulsoup4 version 4.7.1. Not sure if the culprit is python version or bs4 version (more likely).
Can someone please help?
回答1:
I see no problem using select
from bs4 import BeautifulSoup as bs
html = '''
<div id='display'>
<div class='result'>
<div>text0 </p></div>
<div>text1</div>
<div>text2</div>
</div>
</div>
'''
soup = bs(html)
soup.select('.result')
来源:https://stackoverflow.com/questions/55471247/beautifulsoup-not-extracting-div-properly