rcurl

Scrape password-protected website in R

白昼怎懂夜的黑 提交于 2019-12-27 11:41:31
问题 I'm trying to scrape data from a password-protected website in R. Reading around, it seems that the httr and RCurl packages are the best options for scraping with password authentication (I've also looked into the XML package). The website I'm trying to scrape is below (you need a free account in order to access the full page): http://subscribers.footballguys.com/myfbg/myviewprojections.php?projector=2 Here are my two attempts (replacing "username" with my username and "password" with my

Scrape password-protected website in R

不想你离开。 提交于 2019-12-27 11:40:02
问题 I'm trying to scrape data from a password-protected website in R. Reading around, it seems that the httr and RCurl packages are the best options for scraping with password authentication (I've also looked into the XML package). The website I'm trying to scrape is below (you need a free account in order to access the full page): http://subscribers.footballguys.com/myfbg/myviewprojections.php?projector=2 Here are my two attempts (replacing "username" with my username and "password" with my

Using R to upload many files

旧巷老猫 提交于 2019-12-25 08:59:22
问题 I have 30 files: f1.csv, f2.csv, ... f30.csv. I would like to upload all files with R, about as: ftpUpload(c(f1.csv, f2.csv, ... f30.csv), http://..., ...) How can I to upload with the command ftpUpload many files? 回答1: As @Soheil mentions, why not just save the files first, then upload? Any reason you can't just do a for loop? Something like: files = c("f1.csv", "f2.csv", "f30.csv") for (file in files){ ftpUpload(file, paste("ftp://...",file,sep = ""), ) } 来源: https://stackoverflow.com

Handling paginated SQL query results

让人想犯罪 __ 提交于 2019-12-25 06:23:08
问题 For my dissertation data collection , one of the sources is an externally-managed system, which is based on Web form for submitting SQL queries . Using R and RCurl , I have implemented an automated data collection framework, where I simulate the above-mentioned form. Everything worked well while I was limiting the size of the resulting dataset. But, when I tried to go over 100000 records ( RQ_SIZE in the code below), the tandem "my code - their system" started being unresponsive ("hanging").

Handling paginated SQL query results

一世执手 提交于 2019-12-25 06:22:00
问题 For my dissertation data collection , one of the sources is an externally-managed system, which is based on Web form for submitting SQL queries . Using R and RCurl , I have implemented an automated data collection framework, where I simulate the above-mentioned form. Everything worked well while I was limiting the size of the resulting dataset. But, when I tried to go over 100000 records ( RQ_SIZE in the code below), the tandem "my code - their system" started being unresponsive ("hanging").

R error - subscript out of bounds

三世轮回 提交于 2019-12-24 11:45:07
问题 I am trying to run this code which takes a list of addresses and runs each one through Google's Geocode API (using function Addr2latlng below) to get the latitude/longitude and puts each one into a data frame using ProcessAddrList below. The problem is Addr2latlng works fine for one address and ProcessAddrList works fine for up to 10 addresses, but from 11 addresses or more I get the error below. For 10 addresses this works fine. To run the code below requires the packages RCurl and RJSONIO

Multiple Google Places API calls within Sapply function

时间秒杀一切 提交于 2019-12-24 08:12:48
问题 I have a list of locations that I'm feeding into the Google Places API. Some locations have more than 20 results. I'm providing an example of one such location below. To get results beyond the first 20, you have to make an additional API call to Google Places, with an extra "token" parameter that is obtained from the first Google Places API call. Using the below flawed function, I'm attempting to execute the additional API call, based on whether there are additional results that need to be

Google search links obtain by webscraping in R are not in required format

醉酒当歌 提交于 2019-12-24 07:26:44
问题 I am new to web scraping in R and trying to run google search action using a search term from R and extract links automatically. I am partially successful in obtaining the links of google search results using RCurl and XML package. However, the href links I extract include unwanted information and are not in the format of a "URL". The code I use is: html <- getURL(u) links <- xpathApply(doc, "//h3//a[@href]", xmlGetAttr, 'href') links <- grep("http://", links, fixed = TRUE, value=TRUE) The

Multiple web table mining with R, RCurl

江枫思渺然 提交于 2019-12-24 02:47:10
问题 First of all, thanks in advance for any responses. I need to obtain a table by joining some smaller tables in their respective web pages. To date, I've been capable of extracting the info, but failed to do it automatically using a loop. To date, my commands are: library(RCurl) library(XML) # index <- toupper(letters) # EDIT: index <- LETTERS index[1] <- "0-A" url <- paste("www.citefactor.org/journal-impact-factor-list-2014_", index, ".html", sep="", collapse=";") urls <- strsplit(url, ";") [

Line by line reading from HTTPS connection in R

社会主义新天地 提交于 2019-12-23 12:50:57
问题 When a connection is created with open="r" it allows for line-by-line reading, which is useful for batch processing large data streams. For example this script parses a sizable gzipped JSON HTTP stream by reading 100 lines at a time. However unfortunately R does not support SSL: > readLines(url("https://api.github.com/repos/jeroenooms/opencpu")) Error in readLines(url("https://api.github.com/repos/jeroenooms/opencpu")) : cannot open the connection: unsupported URL scheme The RCurl and httr