爬取汽车之家北京二手车信息 经测试,该网站: https://www.che168.com/beijing/list/ 反爬机制较低,仅需要伪造请求头设置爬取速率,但是100页之后需要登录,登录之后再爬要慎重,一不小心就会永久封号。爬取的数据以各种类型存放,下面展示保存到mysql数据库中: 代码解析: 程序源码自提Github: https://github.com/H-Ang/carsSpider 爬虫主程序 # 汽车之家爬虫,北京二手车 import requests from lxml import etree from data_save import * import time class Car_second(): name = '' gonglishu = '' brought_year = '' location = '' img_url = '' price = '' def getInfors(url,i): print("Page %d is saving." % i) # 构造请求头 headers = { "Cache-Control":"no-cache", "User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)