How to fill in an online form and get results back in R

后端 未结 4 1550
臣服心动
臣服心动 2021-02-03 11:33

Has anyone ever filled in a web form remotely from R?

I\'d like to do some archery statistics in R using my scores. There is a very handy webpage, that gives you the cla

相关标签:
4条回答
  • 2021-02-03 12:04

    this might not help you, as I am searching for an answer to a similar problem, but looking at the URL you would like to scrape, the forms to fill are actuall HTML Forms, and you can get the description by:

    url <- "http://www.archersmate.co.uk/"
    forms <- getHTMLFormDescription(url)
    

    Also look at the package "RHTMLForms" on omegahat.org

    0 讨论(0)
  • 2021-02-03 12:14

    This cannot be done in RCurl because the form triggers an ajax event, so the postForm function will not be enough.

    0 讨论(0)
  • 2021-02-03 12:17

    You can use the RSelenium package to fill out and submit web forms and to retrieve the results.

    The following code leveraging RSelenium will download data for an example input (Male, Under 18, Longbow, Bristol V, 500):

    library(RSelenium)
    
    # Start Selenium Server --------------------------------------------------------
    
    checkForServer()
    startServer()
    remDrv <- remoteDriver()
    remDrv$open()
    
    
    # Simulate browser session and fill out form -----------------------------------
    
    remDrv$navigate('http://www.archersmate.co.uk/')
    remDrv$findElement(using = "xpath", "//input[@value = 'Male']")$clickElement()
    Sys.sleep(2) 
    remDrv$findElement(using = "xpath", "//select[@id = 'drpAge']/option[@value = 'Under 18']")$clickElement()
    remDrv$findElement(using = "xpath", "//input[@value ='Longbow']")$clickElement() 
    remDrv$findElement(using = "xpath", "//select[@id = 'rnd']/option[@value = 'Bristol V']")$clickElement()
    remDrv$findElement(using = "xpath", "//input[@id ='scr']")$sendKeysToElement(list('5', '0', '0'))
    remDrv$findElement(using = "xpath", "//input[@id = 'cmdCalc']")$clickElement()
    
    # Retrieve and download results injecting javascript ---------------------------
    
    Sys.sleep(2)
    clsf <- remDrv$executeScript(script = 'return $("#txtClass").val();', args = list())[[1]]
    hndcp <- remDrv$executeScript(script = 'return $("#txtHandicap").val();', args = list())[[1]]
    
    remDrv$quit()
    remDrv$closeServer()
    

    The default browser for RSelenium is Firefox. However, RSelenium even supports headless browsing using PhantomJS. For leveraging PhanomJS you just need to

    • download PhantomJS and place it in the users path
    • replace the code snippets at the beginning and at the end like described next

    Default browsing (like shown above):

    checkForServer()
    startServer()
    remDrv <- remoteDriver()
    
    ...
    
    remDrv$quit()
    remDrv$closeServer()
    

    Headless browsing:

    pJS <- phantom()
    remDrv <- remoteDriver(browserName = 'phantomjs')
    
    ...
    
    remDrv$close()
    pJS$stop()
    
    0 讨论(0)
  • 2021-02-03 12:17

    You might want to take a look at Rcurl's postForm here and theres also a nice tutorial here

    0 讨论(0)
提交回复
热议问题