BeautifulSoup not extracting div properly

问题

BeautifulSoup is not extracting the div I want properly. I am not sure what I am doing wrong. Here is the html:

                <div id='display'>
                      <div class='result'>
                           <div>text0 </p></div>
                           <div>text1</div>
                           <div>text2</div>
                       </div>
                  </div>

And here is my code:

div = soup.find("div", {"class": "result"})
print(div)

I am seeing this:

<div class="result">
<div>text0 </div></div>

What I am expecting is this:

<div class="result">
<div>text0</div>
<div>text1</div>
<div>text2</div>
</div>

This works as expected if I remove the </p> tag. In other words, the </p> tag seems to be throwing the parser off.

Edit:

This works as expected on Python 2.7.12, beautifulsoup4 version 4.5.1. But does not work on Python 3.6.4, beautifulsoup4 version 4.7.1. Not sure if the culprit is python version or bs4 version (more likely).

Can someone please help?

回答1:

I see no problem using select

from bs4 import BeautifulSoup as bs
html = '''
<div id='display'>
                      <div class='result'>
                           <div>text0 </p></div>
                           <div>text1</div>
                           <div>text2</div>
                       </div>
                  </div>
                  '''
soup = bs(html)
soup.select('.result')

来源：https://stackoverflow.com/questions/55471247/beautifulsoup-not-extracting-div-properly

标签

beautifulsoup

html-parsing

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!