【python爬虫】class和class_

在使用BeautifulSoup库的find_all()方法定位所需要的元素时，当匹配条件为 class时，会编译报错：
这里写图片描述

这时候应该使用 class_ 就不报错了。

soup.find_all('div', class_ = 'iimg-box-meta')

原因：

class在 python 中是关键保留字，不能再将这些字作为变量名或过程名使用，所以class_ 应运而生。

python中共有35个保留关键字

1	2	3	4	5
False	True	None	and	break
as	assert	async	await	class
continue	def	yield	del	elif
else	except	finally	for	from
global	if	import	in	is
lambda	nonlocal	not	or	pass
raise	return	try	while	with

import requests
from bs4 import BeautifulSoup
headers = {
    'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36'}
res = requests.get('http://www.cnplugins.com/',headers = headers) #get方法中加入请求头
soup = BeautifulSoup(res.text, 'html.parser') #对返回的结果进行解析
#print (soup.prettify())
 # BeautifulSoup库是一个非常流行的Python模块
 # 可以轻松地解析Requests库请求的网页，并把网页源代码解析位Soup文档，以便过滤提取数据。
 # BeautifulSoup官方推荐使用lxml作为解析器，因为效率高
print (soup.find_all('div', "iimg-box-meta")) # 查找 div class='iimg-box-meta'
print (soup.find_all('div', class_ = 'iimg-box-meta'))
print (soup.find_all('div', attrs = {"class": "iimg-box-meta"}))
print (soup.find_all('a', href = "/tool/save-as-mht.html"))   #可以
print (soup.find_all('a', href_ = "/tool/save-as-mht.html"))  #不行
print (soup.find_all('a', attrs = {"href": "/tool/save-as-mht.html", "target": "_blank"}))

来源：CSDN

作者：masterbu

链接：https://blog.csdn.net/lA6Nf/article/details/79337400

标签

python

python爬虫

class

soup