Python - Easy way to scrape Google, download top N hits (entire .html documents) for given search?

前端 未结 3 566
感动是毒
感动是毒 2021-02-14 12:17

Is there an easy way to scrape Google and write the text (just the text) of the top N (say, 1000) .html (or whatever) documents for a given search?

As an example, imagin

3条回答
  •  轻奢々
    轻奢々 (楼主)
    2021-02-14 13:03

    As mentioned, scraping Google violates their TOS. That said, that's probably not the answer you're looking for.

    There's a PHP script available that does a perfect job of scraping Google: http://google-scraper.squabbel.com/ Just give it a keyword, # of results you want, and it'll return all the results for you. Just parse for the URLs returned, use urllib, or curl to extract the HTML source, and you're done.

    You also really shouldn't attempt to scrape Google unless you got more than 100 proxy servers though. They'll easily ban your IP temporarily after a few attempts.

提交回复
热议问题