问题
I would like to get response headers from GET or POST.
My example is:
library(httr)
library(RCurl)
url<-'http://www.omegahat.org/RCurl/philosophy.html'
doc<-GET(url)
names(doc)
[1] "url" "handle" "status_code" "headers" "cookies" "content" "times" "config"
but there is no response headers, only request headers.
Result shoud be something like this:
Connection:Keep-Alive
Date:Mon, 11 Feb 2013 20:21:56 GMT
ETag:"126a001-e33d-4c12cf2702440"
Keep-Alive:timeout=15, max=100
Server:Apache/2.2.14 (Ubuntu)
Vary:Accept-Encoding
Can I do this with R and httr/RCurl packages or R is not enough for this kind of problem?
Edit: I would like to get all response headers. I am mainly interested in Location response which is not in this example.
Edit2: I forgot to tell the system which I work on - it is Windows 7
My session.info
> sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=Polish_Poland.1250 LC_CTYPE=Polish_Poland.1250 LC_MONETARY=Polish_Poland.1250
[4] LC_NUMERIC=C LC_TIME=Polish_Poland.1250
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] rjson_0.2.12 RCurl_1.95-3 bitops_1.0-5 httr_0.2 XML_3.95-0.1
loaded via a namespace (and not attached):
[1] digest_0.6.2 stringr_0.6.2 tools_2.15.2
回答1:
You can do it this way :
h <- basicHeaderGatherer()
doc <- getURI("http://www.omegahat.org/RCurl/index.html", headerfunction = h$update)
h$value()
Which will give you a named vector :
Date Server
"Mon, 11 Feb 2013 20:41:58 GMT" "Apache/2.2.14 (Ubuntu)"
Last-Modified ETag
"Wed, 24 Oct 2012 15:49:35 GMT" "\"3262089-10bf-4ccd0088461c0\""
Accept-Ranges Content-Length
"bytes" "4287"
Vary Content-Type
"Accept-Encoding" "text/html"
status statusMessage
"200" "OK"
回答2:
curl -I http://www.google.com
HTTP/1.1 200 OK
Date: Mon, 11 Feb 2013 20:36:06 GMT
Expires: -1
Cache-Control: private, max-age=0
Content-Type: text/html; charset=ISO-8859-1
Set-Cookie: PREF=ID=ec3eb1b4b4f31100:FF=0:TM=1360614966:LM=1360614966:S=EjQCjjdv07A6PRtw; expires=Wed, 11-Feb-2015 20:36:06 GMT; path=/; domain=.google.com
Set-Cookie: NID=67=neiRZQ9fctd6NqzdKNdRMzfBqk-yAaxxxruYrnsvTcJeG7q8TJm5Ybv1UZ2ZV_ZheYhy-RwgAppHUh1VhIz4KOcFbcl8-0DvtPYXxaiSQmYvXGEKqeh4glhqvhOdxJKB; expires=Tue, 13-Aug-2013 20:36:06 GMT; path=/; domain=.google.com; HttpOnly
P3P: CP="This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info."
Server: gws
X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN
Transfer-Encoding: chunked
curl -v http://google.com/
$ curl -v http://google.com/
* About to connect() to google.com port 80 (#0)
* Trying 66.102.7.104... connected
* Connected to google.com (66.102.7.104) port 80 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.16.4 (i386-apple-darwin9.0) libcurl/7.16.4 OpenSSL/0.9.7l zlib/1.2.3
> Host: google.com
> Accept: */*
>
< HTTP/1.1 301 Moved Permanently
< Location: http://www.google.com/
< Content-Type: text/html; charset=UTF-8
< Date: Thu, 15 Jul 2010 06:06:52 GMT
< Expires: Sat, 14 Aug 2010 06:06:52 GMT
< Cache-Control: public, max-age=2592000
< Server: gws
< Content-Length: 219
< X-XSS-Protection: 1; mode=block
<
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
</BODY></HTML>
* Connection #0 to host google.com left intact
* Closing connection #0
来源:https://stackoverflow.com/questions/14820286/get-response-header