问题
I'm relatively new to R programming and I'm trying to put some of the stuff I'm learning in the Johns Hopkins Data Science track to practical use. Specifically, I would like to automate the process of downloading historical bond prices from the US Treasury website
Using both Firefox and R, I was able to determine that the US Treasury website uses a very simple HTML POST form to specify a single date for the quotes of interest. It then returns a table of secondary market information for all outstanding bonds.
I have unsuccessfully tried to use two different R packages to submit a request to the US Treasury web server. Hare are the two approaches I tried:
Attempt #1 (using RCurl):
url <- "https://www.treasurydirect.gov/GA-FI/FedInvest/selectSecurityPriceDate.htm"
td.html <- postForm(url,
submit = "Show Prices",
priceDate.year = 2014,
priceDate.month = 12,
priceDate.day = 15,
.opts = curlOptions(ssl.verifypeer = FALSE))
This results in a web page being returned and stored in td.html
but all it contains is an error message from the treasurydirect server. I know the server is working because when I submit the same request via my browser, I get the expected results.
Attempt #2 (using rvest):
s <- html_session(url)
f0 <- html_form(s)
f1 <- set_values(f0[[2]], priceDate.year=2014, priceDate.month=12, priceDate.day=15)
test <- submit_form(s, f1)
Unfortunately, this approach doesn't even leave R and results in the following error message from R:
Submitting with 'submit'
Error in function (type, msg, asError = TRUE) : <url> malformed
I can't seem to figure out how to see what "malformed" text is being sent to rvest so that I can try to diagnose the problem.
Any suggestions or tips to solving this seeming simple task would be greatly appreciated!
回答1:
Well, it appears to work with the httr
library.
library(httr)
url <- "https://www.treasurydirect.gov/GA-FI/FedInvest/selectSecurityPriceDate.htm"
fd <- list(
submit = "Show Prices",
priceDate.year = 2014,
priceDate.month = 12,
priceDate.day = 15
)
resp<-POST(url, body=fd, encode="form")
content(resp)
The rvest
library is really just a wrapper to httr
. It looks like it doesn't do a good job of interpreting absolute URLs without the server name. So if you look at
f1$url
# [1] /GA-FI/FedInvest/selectSecurityPriceDate.htm
you see that it just has the path and not the server name. This appears to be confusing httr
. If you do
f1 <- set_values(f0[[2]], priceDate.year=2014, priceDate.month=12, priceDate.day=15)
f1$url <- url
test <- submit_form(s, f1)
that seems to work. Perhaps it's a big that should be reported to rvest
. (Tested on rvest_0.1.0
)
回答2:
I know this is an old question, but adding the
style='POST'
parameter to postForm
does the trick as well.
来源:https://stackoverflow.com/questions/27631460/how-can-i-post-a-simple-html-form-in-r