rcurl

RCurl getURL with loop - link to a PDF kills looping

风格不统一 提交于 2019-12-23 02:21:32
问题 I've been puzzling this long enough now and can't seem to figure out how to get around it. Easiest to give working dummy code: require(RCurl) require(XML) #set a bunch of options for curl options(RCurlOptions = list(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl"))) agent="Firefox/23.0" curl = getCurlHandle() curlSetOpt( cookiejar = 'cookies.txt' , useragent = agent, followlocation = TRUE , autoreferer = TRUE , httpauth = 1L, # "basic" http authorization version -- this seems

Extracting HTML table into R

陌路散爱 提交于 2019-12-23 01:14:06
问题 I've been trying to extract a table from a webpage. The data is a flight track data from live flight tracking website (https://flightaware.com/live/flight/WJA1508/history/20150814/1720Z/CYYC/KSFO/tracklog). I've tried XML, RCurl and Curl packages, but I didn't work. I believe most likely because I couldn't figure out how to avoid the SSL as well as the columns that contains notes on the flight status (i. e., first two from the top and third from the bottom of the table). Can any one knows how

Retrieve modified DateTime of a file from an FTP Server

[亡魂溺海] 提交于 2019-12-22 13:30:39
问题 Is there a way to find the modified date/time for files on an FTP server in R? I have found a great way to list all of the files that are available, but I only want to download ones that have been updated since my last check. I tried using: info<-file.info(url) However, it returns a pretty ugly list of nothing. My url is made up of: "ftp://username:password@FTPServer//filepath.xml" 回答1: Until we see the output from this particular FTP server (they are all different) for directory listings,

RCurl: Display progress meter in Rgui

吃可爱长大的小学妹 提交于 2019-12-22 09:52:00
问题 Using R.exe or Rterm.exe , this gives an excellent progress meter. page=getURL(url="ftp.wcc.nrcs.usda.gov", noprogress=FALSE) In Rgui I am limited to: page=getURL(url="ftp.wcc.nrcs.usda.gov", noprogress=FALSE, progressfunction=function(down,up) print(down)) which gives a very limited set of download information. Is there a way to improve this? 回答1: I start doubting that with standard R commands it is possible to reprint overwriting the current line, which is what RCurl does in non-GUI mode. I

Get response header

萝らか妹 提交于 2019-12-22 06:31:30
问题 I would like to get response headers from GET or POST. My example is: library(httr) library(RCurl) url<-'http://www.omegahat.org/RCurl/philosophy.html' doc<-GET(url) names(doc) [1] "url" "handle" "status_code" "headers" "cookies" "content" "times" "config" but there is no response headers, only request headers. Result shoud be something like this: Connection:Keep-Alive Date:Mon, 11 Feb 2013 20:21:56 GMT ETag:"126a001-e33d-4c12cf2702440" Keep-Alive:timeout=15, max=100 Server:Apache/2.2.14

Login to .NET site using R

吃可爱长大的小学妹 提交于 2019-12-22 05:00:16
问题 I am trying to login with my credentials to a .NET site but unable to get it working. My code is inspired from the below thread How to login and then download a file from aspx web pages with R library(RCurl) curl = getCurlHandle() curlSetOpt(cookiejar = 'cookies.txt', followlocation = TRUE, autoreferer = TRUE, curl = curl) html <- getURL('http://www.aceanalyser.com/Login.aspx', curl = curl) viewstate <- as.character(sub('.*id="__VIEWSTATE" value="([0-9a-zA-Z+/=]*).*', '\\1', html))

Upload a file over 2.15 GB in R

喜欢而已 提交于 2019-12-22 01:12:59
问题 I've got a manual process where I'm uploading 5-6 GB file to a web server via curl: curl -X POST --data-binary @myfile.csv http://myserver::port/path/to/api This process works fine, but I'd love to automate it using R. The problem is, I either don't know what I'm doing, or the R libraries for curl don't know how to handle files bigger than ~2GB: library(RCurl) postForm( "http://myserver::port/path/to/api", file = fileUpload( filename = path.expand("myfile.csv"), contentType = "text/csv" ),

Scraping experimentally measured physicochemical properties and synonyms from Chemspider in R

淺唱寂寞╮ 提交于 2019-12-21 23:01:18
问题 Although the Chemspider SSOAP Web API allows one to retrieve the chemical structure of given compounds, it does not allow one to retrieve experimentally measured physicochemical properties like boiling points and listed synonyms. E.g. if you look in http://www.chemspider.com/Chemical-Structure.733.html it gives a list of Synonyms and Experimental data under Properties (you may have to register first to see this info), which I would like to retrieve in R. I got some way by doing library(httr)

RCurl getForm pass http headers

点点圈 提交于 2019-12-21 20:46:08
问题 Using RCurl's getForm function, which is the only nice way of passing in GET-parameters, I need to alter some http headers. In getURI, you just pass httpheader = c(Whatever='whatever',...) and it'll work. Unfortunately, that argument seems to be ignored by getForm . How do I set the http headers in a getForm request? 回答1: Welcome to the confusing world of RCurl ! You've discovered that its syntax makes no sense, which is not your fault. In getForm you pass headers as the second argument (the

Get website directory listing in an R vector using RCurl

自古美人都是妖i 提交于 2019-12-21 17:22:10
问题 I'm trying to get the list of files in a directory on a website. Is there a way to do this similar to the dir() or list.files() commands for local directory listing? I can connect to the website using RCurl (I need it because I need an SSL connection over HTTPS): library(RCurl) text=getURL(*some https website* ,ssl.verifypeer = FALSE ,dirlistonly = TRUE) But this creates an HTML file with images, hyperlinks, etc. of a list of files, but I just need an R vector of files as you would obtain