Fill in web form, submit and download results

回眸只為那壹抹淺笑 提交于 2019-12-21 05:12:21

问题


I am wanting to fill in a web form and submit my query and download the resulting data. Some of the fields have the option of a drop-down menu or typing in a search query, sections can also be left blank (if all sections are left blank the entire database is downloaded), hitting the "search and download" button should instigate the downloading of a file.

Here is what I have tried (selecting all records for species "Salmo salar") based on this question. I used my browser (Opera) "Developer Tools" to inspect page elements and identify the names of all the possible fields:

library(httr)

url <- "https://nzffdms.niwa.co.nz/search"

fd <- list(
  search_catchment_no_name = "",
  search_river_lake = "",
  search_sampling_locality = "",
  search_fishing_method = "",
  search_start_year = "",
  search_end_year = "",
  search_species  = "Salmo salar", # species of interest
  search_download_format = 1,      # select csv file format
  submit = "Search and Download"
)

POST(url, body = fd, encode = "form")

I had hoped this would result in a csv file being downloaded (all records for species "Salmo salar"), but no file downloads (but outputs this (list of 10, just showing the first bit):

Response [https://nzffdms.niwa.co.nz/search]
Date: 2019-10-02 23:35
Status: 200
Content-Type: text/html; charset=utf-8
Size: 19.1 kB
<!DOCTYPE html>  
  <html>  
  <head>  
  <meta http-equiv="Content-Type" content="text/html; c...
    <meta name="title" content="NZ Freshwater Fish Database...
<meta name="description" content="NIWA NZ Freshwater Fish...
<meta name="keywords" content="NIWA, NZ, Freshwater Fish" />
<meta name="language" content="en" />
<meta name="robots" content="index, follow />

...

Edit

I think the issue is with how I am calling the Search and download button, when inspecting the web-page most fields look like this:

# end year field
<input maxlength="4" class="form-control" type="text" name="search[end_year]" id="search_end_year">

But the search and download button elements don't have a name or id option:

<input type="submit" value="Search and Download" class="btn btn-primary btn-md">

Also I have just noticed there is a hidden field, maybe I need to define this?

<input type="hidden" name="search[_csrf_token]" value="d1530f09c1ce8110b5163bd100cb0d67" id="search__csrf_token">

Any advice on how I can get the file downloading would be much appreciated.


回答1:


First, check robots.txt on the website. It is commented out as of Oct 3, 2019.

Then read the terms and conditions on https://nzffdms.niwa.co.nz/terms and https://www.niwa.co.nz/freshwater-and-estuaries/nzffd/user-guide/tips and make sure you obey the terms and conditions.

And it is also important to throttle the request below.

After checking all the terms and conditions, you can use the code below to query for your data:

library(httr)
library(xml2)

gr <- GET("https://nzffdms.niwa.co.nz/search")
doc <- read_html(content(gr, "text"))     #doc <- read_html(gr) #this works as well
getTbl <- function(x) {
    do.call(rbind, lapply(xml_find_all(doc, paste0(".//select[@name='search",x,"']/option")),
        function(n) data.frame(NAME=xml_text(n), VALUE=xml_attr(n, "value"))))
}
fishing_method <- getTbl("[fishing_method]")
species <- getTbl("[species][]")
csrf_token <- xml_attr(xml_find_all(doc, ".//input[@name='search[_csrf_token]']"), "value")

fd <- list(
    "search[catchment_no_name]"="",
    "search[river_lake]"="",
    "search[sampling_locality]"="",
    "search[fishing_method]"="",
    "search[species][]"="",
    "search[species][]"=68,
    "search[start_year]"="",
    "search[end_year]"="",
    "search[download_format]"="1",
    "search[_csrf_token]"=csrf_token
)
r <- POST("https://nzffdms.niwa.co.nz/doSearch", body=fd, encode="form")
read.csv(text=content(r, "text", encoding="UTF-8"))

output:

   card m    y catchname  catch        locality time  org map    east   north altitude penet fishmeth effort pass spcode abund number minl maxl  nzreach
1  3964 1 1981   Waiau R 797.49       Lake Gunn   NA niwa d41 2122400 5581200      477   225      ang     NA   NA salsal    NA     NA   NA   NA 15006671
2  3965 1 1981   Waiau R 797.49     Lake Fergus   NA niwa d41 2123700 5584400      483   229      ang     NA   NA salsal    NA     NA   NA   NA 15006092
3 15975 1 2003   Waiau R 797.40 Excelsior Creek 1330 niwa d44 2095800 5495800      190    94      efp     80    1 salsal    NA      2  102  105 15030686
4 50772 1 1940   Waiau R 797.49 Upukerora River   NA  unk d43 2098500 5519900      210   146      unk     NA   NA salsal    NA     NA   NA   NA 15020897


来源:https://stackoverflow.com/questions/58159645/fill-in-web-form-submit-and-download-results

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!