Python - Easy way to scrape Google, download top N hits (entire .html documents) for given search?

前端 未结 3 580
感动是毒
感动是毒 2021-02-14 12:17

Is there an easy way to scrape Google and write the text (just the text) of the top N (say, 1000) .html (or whatever) documents for a given search?

As an example, imagin

3条回答
  •  时光取名叫无心
    2021-02-14 13:03

    Check out BeautifulSoup for scraping the content out of web pages. It is supposed to be very tolerant of broken web pages which will help because not all results are well formed. So you should be able to:

    • Request http://www.google.ca/search?q=QUERY_HERE
    • Extract and follow result links using BeautifulSoup (It appears as though class="r" for result links)
    • Extract text from result pages using BeautifulSoup

提交回复
热议问题