【爬取练习】 | 易学教程

【爬取练习】

练习一：爬取iot的门户网站中环保管家页面内容：

import requests 
from bs4 import BeautifulSoup

url='http://www.ioteis.com/Stewardship.html'

response_data=requests.get(url)
response_data.encoding='utf-8'

#把html页面进行解析
soup=BeautifulSoup(response_data.text,'lxml')
#分析发现内容放在content1下面的div中
for hbgj in soup.select(".content1 "):
	title=hbgj.select("div.title1 a")[0].text
	content=hbgj.select('div.ptext')[0].text
	print("标题为：{}，内容为：{}".format(title,content))

来源：https://www.cnblogs.com/benpao1314/p/11401322.html

标签

网站分析

lxml

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!