How can I POST a simple HTML form in R?

微笑、不失礼 提交于 2019-12-30 00:39:11

问题


I'm relatively new to R programming and I'm trying to put some of the stuff I'm learning in the Johns Hopkins Data Science track to practical use. Specifically, I would like to automate the process of downloading historical bond prices from the US Treasury website

Using both Firefox and R, I was able to determine that the US Treasury website uses a very simple HTML POST form to specify a single date for the quotes of interest. It then returns a table of secondary market information for all outstanding bonds.

I have unsuccessfully tried to use two different R packages to submit a request to the US Treasury web server. Hare are the two approaches I tried:

Attempt #1 (using RCurl):

url <- "https://www.treasurydirect.gov/GA-FI/FedInvest/selectSecurityPriceDate.htm"
td.html <- postForm(url,
                    submit = "Show Prices",
                    priceDate.year  = 2014,
                    priceDate.month = 12,
                    priceDate.day   = 15,
                   .opts = curlOptions(ssl.verifypeer = FALSE))

This results in a web page being returned and stored in td.html but all it contains is an error message from the treasurydirect server. I know the server is working because when I submit the same request via my browser, I get the expected results.

Attempt #2 (using rvest):

s <- html_session(url)
f0 <- html_form(s)
f1 <- set_values(f0[[2]], priceDate.year=2014, priceDate.month=12, priceDate.day=15)
test <- submit_form(s, f1)

Unfortunately, this approach doesn't even leave R and results in the following error message from R:

Submitting with 'submit'
Error in function (type, msg, asError = TRUE)  : <url> malformed

I can't seem to figure out how to see what "malformed" text is being sent to rvest so that I can try to diagnose the problem.

Any suggestions or tips to solving this seeming simple task would be greatly appreciated!


回答1:


Well, it appears to work with the httr library.

library(httr)

url <- "https://www.treasurydirect.gov/GA-FI/FedInvest/selectSecurityPriceDate.htm"

fd <- list(
    submit = "Show Prices",
    priceDate.year  = 2014,
    priceDate.month = 12,
    priceDate.day   = 15
)

resp<-POST(url, body=fd, encode="form")
content(resp)

The rvest library is really just a wrapper to httr. It looks like it doesn't do a good job of interpreting absolute URLs without the server name. So if you look at

f1$url
# [1] /GA-FI/FedInvest/selectSecurityPriceDate.htm

you see that it just has the path and not the server name. This appears to be confusing httr. If you do

f1 <- set_values(f0[[2]], priceDate.year=2014, priceDate.month=12, priceDate.day=15)
f1$url <- url
test <- submit_form(s, f1)

that seems to work. Perhaps it's a big that should be reported to rvest. (Tested on rvest_0.1.0)




回答2:


I know this is an old question, but adding the

style='POST'

parameter to postForm does the trick as well.



来源:https://stackoverflow.com/questions/27631460/how-can-i-post-a-simple-html-form-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!