CSS选择器
通过select()直接传入CSS选择器即可完成选择
实例代码如下:
html=''' <div class="panel"> <div class="panel-heading"> <h4>Hello</h4> </div> <div class="panel-body"> <ul class="list" id="list-1"> <li class="element">Foo</li> <li class="element">Bar</li> <li class="element">Jay</li> </ul> <ul class="list list-small" id="list-2"> <li class="element">Foo</li> <li class="element">Bar</li> </ul> </div> </div> '''
1.基本语法
from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'lxml') print(soup.select('.panel .panel-heading'))#选择class的类型 print(soup.select('ul li'))#直接选择标签 print(soup.select('#list-2 .element'))#选择id的类型 print(type(soup.select('ul')[0]))
输出如下:
[<div class="panel-heading"> <h4>Hello</h4> </div>] [<li class="element">Foo</li>, <li class="element">Bar</li>, <li class="element">Jay</li>, <li class="element">Foo</li>, <li class="element">Bar</li>] [<li class="element">Foo</li>, <li class="element">Bar</li>] <class 'bs4.element.Tag'>
2.层层迭代
#把每一组ul的li输出 from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'lxml') for ul in soup.select('ul'): print(ul.select('li'))
3,获取属性
from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'lxml') for ul in soup.select('ul'): print(ul['id'])#这两种方法都能获取标签的属性(id或其他) print(ul.attrs['id'])
输出如下:
list-1 list-1 list-2 list-2
4,获取内容
from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'lxml') for li in soup.select('li'): print(li.get_text())#输出li里的内容
输出如下:
Foo Bar Jay Foo Bar
来源:https://www.cnblogs.com/yangshuai2020/p/12335309.html