web scraping to fill out (and retrieve) search forms?

前端 未结 4 1660
孤独总比滥情好
孤独总比滥情好 2021-01-02 17:49

I was wondering if it is possible to \"automate\" the task of typing in entries to search forms and extracting matches from the results. For instance, I have a list of journ

4条回答
  •  醉梦人生
    2021-01-02 18:17

    Beautiful Soup is great for parsing webpages- that's half of what you want to do. Python, Perl, and Ruby all have a version of Mechanize, and that's the other half:

    http://wwwsearch.sourceforge.net/mechanize/

    Mechanize let's you control a browser:

    # Follow a link
    browser.follow_link(link_node)
    
    # Submit a form
    browser.select_form(name="search")
    browser["authors"] = ["author #1", "author #2"]
    browser["volume"] = "any"
    search_response = br.submit()
    

    With Mechanize and Beautiful Soup you have a great start. One extra tool I'd consider is Firebug, as used in this quick ruby scraping guide:

    http://www.igvita.com/2007/02/04/ruby-screen-scraper-in-60-seconds/

    Firebug can speed your construction of xpaths for parsing documents, saving you some serious time.

    Good luck!

提交回复
热议问题