Scrapy: Google Crawl doesn't work

前端 未结 2 1039
终归单人心
终归单人心 2021-01-17 00:40

When I try to crawl Google for search results, Scrapy just yields the Google home page: http://pastebin.com/FUbvbhN4

Here is my spider:

import scrapy         


        
相关标签:
2条回答
  • 2021-01-17 00:51

    for the most cases, google would redirect the spider to the CAPTCHA page, bing search result is easier to crawl.

    there is a project for crawling search result from Google/Bing/Baidu https://github.com/titantse/seCrawler

    0 讨论(0)
  • 2021-01-17 00:58

    Yes, looks like that address is redirecting to the home page:

    example with scrapy shell http://www.google.com/#q=finance.google.com:+3m+co:

    ...
    [s]   request    <GET http://www.google.com/#q=finance.google.com:+3m+co>
    [s]   response   <200 http://www.google.com/>
    ...
    

    Checking your url it makes sense, it isn't containing parameters, but #q (which isn't a url parameter) and the browser is the one recognizing that and making it a google search, so it is not exactly a url path.

    the correct google search url is: http://www.google.com/search?q=YOURQUERY

    0 讨论(0)
提交回复
热议问题