Scrapy Google Search

二次信任 提交于 2019-12-11 05:26:01

问题


I am trying to scrap google search and people also search links.

Example when you go on google and you search Christopher nolan. Google also produces a "people also search for" which includes images of people related to the our search which is Christopher nolan. In this case our People also search produces (Christian bale,Emma Thomas, Zack Synder etc). I am interested in scraping this data.

I am using scrapy framework and wrote a simple scrapper but it returns an empty csv data file. Below is code I have so far your help is appreciated. Hope everything is clear in what i want to achieve. I used Xpath helper(google app) to help find the Xpath.

My code:

# PyGSSpider(spidder folder)
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import Selector
from PyGoogleSearch.items import PyGSItem
import sys

class PyGSSpider(CrawlSpider):
    name = "google"
    allowed_domains = ["www.google.com"]
    start_urls = ["https://www.google.com/#q=christopher+nolan"]

    #Extracts Christopher Nolan link     
    rules = [
        Rule(SgmlLinkExtractor(allow=("https://www.google.com/search?q=christpher+noaln&oq=christpher+noaln&aqs")), follow=True),
        Rule(SgmlLinkExtractor(allow=()), callback='parse_item')
    ]

    #Parse function for extracting the people also search link.
    def parse_item(self,response):
        self.log('Hi, this is an item page! %s' % response.url)
        sel=Selector(response)
        item=PyGSItem()
        item['peoplealsosearchfor'] = sel.xpath('//div[@id="cnt"]/@href').extract()

        return item

items.py:

from scrapy.item import Item, Field

class PyGSItem(Item):
    peoplealsosearchfor = Field()

回答1:


The reason this won't work is because Google enforcer an algorithm which prevents bots from using their search.

However using Selenium might do the trick.



来源:https://stackoverflow.com/questions/23840059/scrapy-google-search

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!