rcurl | 易学教程

Scrape password-protected website in R

阅读更多关于 Scrape password-protected website in R

问题 I'm trying to scrape data from a password-protected website in R. Reading around, it seems that the httr and RCurl packages are the best options for scraping with password authentication (I've also looked into the XML package). The website I'm trying to scrape is below (you need a free account in order to access the full page): http://subscribers.footballguys.com/myfbg/myviewprojections.php?projector=2 Here are my two attempts (replacing "username" with my username and "password" with my

Scrape password-protected website in R

阅读更多关于 Scrape password-protected website in R

Using R to upload many files

阅读更多关于 Using R to upload many files

问题 I have 30 files: f1.csv, f2.csv, ... f30.csv. I would like to upload all files with R, about as: ftpUpload(c(f1.csv, f2.csv, ... f30.csv), http://..., ...) How can I to upload with the command ftpUpload many files? 回答1: As @Soheil mentions, why not just save the files first, then upload? Any reason you can't just do a for loop? Something like: files = c("f1.csv", "f2.csv", "f30.csv") for (file in files){ ftpUpload(file, paste("ftp://...",file,sep = ""), ) } 来源： https://stackoverflow.com

Handling paginated SQL query results

阅读更多关于 Handling paginated SQL query results

问题 For my dissertation data collection , one of the sources is an externally-managed system, which is based on Web form for submitting SQL queries . Using R and RCurl , I have implemented an automated data collection framework, where I simulate the above-mentioned form. Everything worked well while I was limiting the size of the resulting dataset. But, when I tried to go over 100000 records ( RQ_SIZE in the code below), the tandem "my code - their system" started being unresponsive ("hanging").

Handling paginated SQL query results

阅读更多关于 Handling paginated SQL query results

R error - subscript out of bounds

阅读更多关于 R error - subscript out of bounds

问题 I am trying to run this code which takes a list of addresses and runs each one through Google's Geocode API (using function Addr2latlng below) to get the latitude/longitude and puts each one into a data frame using ProcessAddrList below. The problem is Addr2latlng works fine for one address and ProcessAddrList works fine for up to 10 addresses, but from 11 addresses or more I get the error below. For 10 addresses this works fine. To run the code below requires the packages RCurl and RJSONIO

Multiple Google Places API calls within Sapply function

阅读更多关于 Multiple Google Places API calls within Sapply function

问题 I have a list of locations that I'm feeding into the Google Places API. Some locations have more than 20 results. I'm providing an example of one such location below. To get results beyond the first 20, you have to make an additional API call to Google Places, with an extra "token" parameter that is obtained from the first Google Places API call. Using the below flawed function, I'm attempting to execute the additional API call, based on whether there are additional results that need to be

Google search links obtain by webscraping in R are not in required format

阅读更多关于 Google search links obtain by webscraping in R are not in required format

问题 I am new to web scraping in R and trying to run google search action using a search term from R and extract links automatically. I am partially successful in obtaining the links of google search results using RCurl and XML package. However, the href links I extract include unwanted information and are not in the format of a "URL". The code I use is: html <- getURL(u) links <- xpathApply(doc, "//h3//a[@href]", xmlGetAttr, 'href') links <- grep("http://", links, fixed = TRUE, value=TRUE) The

Multiple web table mining with R, RCurl

阅读更多关于 Multiple web table mining with R, RCurl

问题 First of all, thanks in advance for any responses. I need to obtain a table by joining some smaller tables in their respective web pages. To date, I've been capable of extracting the info, but failed to do it automatically using a loop. To date, my commands are: library(RCurl) library(XML) # index <- toupper(letters) # EDIT: index <- LETTERS index[1] <- "0-A" url <- paste("www.citefactor.org/journal-impact-factor-list-2014_", index, ".html", sep="", collapse=";") urls <- strsplit(url, ";") [

Line by line reading from HTTPS connection in R

阅读更多关于 Line by line reading from HTTPS connection in R

问题 When a connection is created with open="r" it allows for line-by-line reading, which is useful for batch processing large data streams. For example this script parses a sizable gzipped JSON HTTP stream by reading 100 lines at a time. However unfortunately R does not support SSL: > readLines(url("https://api.github.com/repos/jeroenooms/opencpu")) Error in readLines(url("https://api.github.com/repos/jeroenooms/opencpu")) : cannot open the connection: unsupported URL scheme The RCurl and httr