Scraping Duckduckgo with Python 3.6

后端 未结 2 1163
暖寄归人
暖寄归人 2021-01-22 05:36

A simple question. i can scrape results from the first page of a duckduckgo search. However i am struggling to get onto the 2nd and subsequent pages. I have used Python with the

2条回答
  •  情话喂你
    2021-01-22 06:11

    If I search for Load More in the source code of the result I can't find it. Did you try using the non-javascript version?

    You can use it by simply add htmlto the url: https://duckduckgo.com/html?q=paralegal&t=h_&ia=web There you can find the next button at the end.

    This one works for me (Chrome version):

    results_url = "https://duckduckgo.com/html?q=paralegal&t=h_&ia=web"
    browser.get(results_url)
    results = browser.find_elements_by_id('links')
    num_page_items = len(results)
    for i in range(num_page_items):
        print(results[i].text)
        print(len(results))
    nxt_page = browser.find_element_by_xpath('//input[@value="Next"]')
    if nxt_page:
        browser.execute_script('arguments[0].scrollIntoView();', nxt_page)
        nxt_page.click()
    

    Btw.: Duckduckgo also provides a nice api, which is probably much easier to use ;)

    edit: fix selector for next page link which selected the prev button on the second result page (thanks to @kingbode)

提交回复
热议问题