问题
In Windows cURL I can post a web request similar to this:
curl --dump-header cook.txt ^
--data "RURL=http=//www.example.com/r&user=bob&password=hello" ^
--user-agent "Mozilla/5.0" ^
http://www.example.com/login
With type cook.txt
I get a response similar to this:
HTTP/1.1 302 Found
Date: Thu, ******
Server: Microsoft-IIS/6.0
SERVER: ******
X-Powered-By: ASP.NET
X-AspNet-Version: 1.1.4322
Location: ******
Set-Cookie: Cookie1=; domain=******; expires=****** ******
******
******
Cache-Control: private
Content-Type: text/html; charset=iso-8859-1
Content-Length: 189
I can manually read cookie lines like: Set-Cookie: AuthCode=ABC...
(I could script this of course). So I can use AuthCode
for subsequent requests.
I am trying do the same in R with RCurl and/or httr (still don't know which one is better for my task).
When I try:
library(httr)
POST("http://www.example.com/login",
body= list(RURL="http=//www.example.com/r",
user="bob", password="hello"),
user_agent("Mozilla/5.0"))
I get a response similar to this:
Response [http://www.example.com/error]
Status: 411
Content-type: text/html
<h1>Length Required</h1>
By and large I know about 411-error and I could try to fix the request; but I do not get it in cURL, so I am doing something wrong with the POST command.
Can you help me in translating my cURL command to RCurl and/or httr?
回答1:
httr
automatically preserves cookies across calls to the same site, as illustrated by these two calls to http://httpbin.org
GET("http://httpbin.org/cookies/set?a=1")
# Response [http://httpbin.org/cookies]
# Status: 200
# Content-type: application/json
# {
# "cookies": {
# "a": "1"
# }
# }
GET("http://httpbin.org/cookies")
# Response [http://httpbin.org/cookies]
# Status: 200
# Content-type: application/json
# {
# "cookies": {
# "a": "1"
# }
# }
Perhaps the problem is that you're sending your data as application/x-www-form-urlencoded
, but the default in httr is multipart/form-data
, so use multipart = FALSE
in your POST
call.
回答2:
Based on Juba suggestion, here is a working RCurl template.
The code emulates a browser behaviour, as it:
- retrieves cookies on a login screen and
- reuses them on the following page requests containing the actual data.
### RCurl login and browse private pages ###
library("RCurl")
loginurl ="http=//www.*****"
mainurl ="http=//www.*****"
agent ="Mozilla/5.0"
#User account data and other login pars
pars=list(
RURL="http=//www.*****",
Username="*****",
Password="*****"
)
#RCurl pars
curl = getCurlHandle()
curlSetOpt(cookiejar="cookiesk.txt", useragent = agent, followlocation = TRUE, curl=curl)
#or simply
#curlSetOpt(cookiejar="", useragent = agent, followlocation = TRUE, curl=curl)
#post login form
web=postForm(loginurl, .params = pars, curl=curl)
#go to main url with real data
web=getURL(mainurl, curl=curl)
#parse/print content of web
#..... etc. etc.
#This has the side effect of saving cookie data to the cookiejar file
rm(curl)
gc()
回答3:
Here is a way to create a post request, keep and reuse the resulting cookies with RCurl
, for example to get web pages when authentication is required :
library(RCurl)
curl <- getCurlHandle()
curlSetOpt(cookiejar="/tmp/cookies.txt", curl=curl)
postForm("http://example.com/login", login="mylogin", passwd="mypasswd", curl=curl)
getURL("http://example.com/anotherpage", curl=curl)
来源:https://stackoverflow.com/questions/15000815/post-request-using-cookies-with-curl-rcurl-and-httr