rcurl

Debugging RCurl-based authentication & form submission

那年仲夏 提交于 2019-12-04 21:58:52
SourceForge Research Data Archive (SRDA) is one of the data sources for my dissertation research. I'm having difficulty in debugging the following issue related to SRDA data collection. Data collection from SRDA requires authentication and then submitting Web form with an SQL query. Upon successful processing of the query, the system generates a text file with query results . While testing my R code for SRDA data collection, I've changed the SQL request to make sure that the results file is being regenerated. However, I've discovered that the file contents stays the same (corresponds to

R: posting search forms and scraping results

会有一股神秘感。 提交于 2019-12-04 20:38:11
I'm a starter in web scraping and I'm not yet familiarized with the nomenclature for the problems I'm trying to solve. Nevertheless, I've searched exhaustively for this specific problem and was unsuccessful in finding a solution. If it is already somewhere else, I apologize in advance and thank your suggestions. Getting to it. I'm trying to build a script with R that will: 1. Search for specific keywords in a newspaper website; 2. Give me the headlines, dates and contents for the number of results/pages that I desire. I already know how to post the form for the search and scrape the results

reading a json file in R: lexical error: invalid char in json text

喜夏-厌秋 提交于 2019-12-04 16:26:47
问题 Here is an example of the code I'm using: library(jsonlite) library(curl) #url url = "http://www.zillow.com/search/GetResults.htm?spt=homes&status=001000&lt=000000&ht=010000&pr=999999,10000001&mp=3779,37788&bd=0%2C&ba=0%2C&sf=,&lot=0%2C&yr=,1800&singlestory=0&hoa=0%2C&pho=0&pets=0&parking=0&laundry=0&income-restricted=0&pnd=0&red=0&zso=0&days=36m&ds=all&pmf=0&pf=0&sch=100111&zoom=6&rect=-91307373,29367814,-84759521,35554574&p=1&sort=globalrelevanceex&search=maplist&rid=4&rt=2&listright=true

Trying to download Google Trends data but date parameter is ignored?

◇◆丶佛笑我妖孽 提交于 2019-12-04 14:13:35
I am trying to download Google Trends data in csv format. For basic queries I have been successful (following a blog post by Christoph Riedl). Problem : By default trends are returned starting from January 2004. I would prefer it to return trends starting from January 2011. However when I add a date parameter to the url request it is completely ignored. I'm not sure how to overcome this. The following is code will reproduce the issue. # Just copy/paste this stuff - these are helper functions require(RCurl) # This gets the GALX cookie which we need to pass back with the login form getGALX <-

Post request using cookies with cURL, RCurl and httr

白昼怎懂夜的黑 提交于 2019-12-04 14:04:40
问题 In Windows cURL I can post a web request similar to this: curl --dump-header cook.txt ^ --data "RURL=http=//www.example.com/r&user=bob&password=hello" ^ --user-agent "Mozilla/5.0" ^ http://www.example.com/login With type cook.txt I get a response similar to this: HTTP/1.1 302 Found Date: Thu, ****** Server: Microsoft-IIS/6.0 SERVER: ****** X-Powered-By: ASP.NET X-AspNet-Version: 1.1.4322 Location: ****** Set-Cookie: Cookie1=; domain=******; expires=****** ****** ****** ****** Cache-Control:

RCurl getForm pass http headers

*爱你&永不变心* 提交于 2019-12-04 12:35:49
Using RCurl's getForm function, which is the only nice way of passing in GET-parameters, I need to alter some http headers. In getURI, you just pass httpheader = c(Whatever='whatever',...) and it'll work. Unfortunately, that argument seems to be ignored by getForm . How do I set the http headers in a getForm request? Welcome to the confusing world of RCurl ! You've discovered that its syntax makes no sense, which is not your fault. In getForm you pass headers as the second argument (the ... ). See the usage section of ? getForm : getForm(uri, ..., .params = character(), .opts = list(), curl =

Using an API to calculate distance between two airports (two columns) within R?

流过昼夜 提交于 2019-12-04 12:17:52
问题 I was wondering whether there was a way to compare airport distances(IATA codes). There are some scripts but not is using R. So I tried that with with the API: developer.aero Example data: library(curl) # for curl post departure <- c("DRS","TXL","STR","DUS","LEJ","FKB","LNZ") arrival <- c("FKB","HER","BOJ","FUE","PMI","AYT","FUE") flyID <- c(1,2,3,4,5,6,7) df <- data.frame(departure,arrival,flyID) departure arrival flyID 1 DRS FKB 1 2 TXL HER 2 3 STR BOJ 3 4 DUS FUE 4 5 LEJ PMI 5 6 FKB AYT 6

Send expression to website return dynamic result (picture)

纵饮孤独 提交于 2019-12-04 06:21:18
I use http://www.regexper.com to view a picto representation regular expressions a lot. I would like a way to ideally: send a regular expression to the site open the site with that expression displayed For example let's use the regex: "\\s*foo[A-Z]\\d{2,3}" . I'd go tot he site and paste \s*foo[A-Z]\d{2,3} (note the removal of the double slashes). And it returns: I'd like to do this process from within R. Creating a wrapper function like view_regex("\\s*foo[A-Z]\\d{2,3}") and the page ( http://www.regexper.com/#%5Cs*foo%5BA-Z%5D%5Cd%7B2%2C3%7D ) with the visual diagram would be opened with the

How to Convert “space” into “%20” with R

守給你的承諾、 提交于 2019-12-04 04:01:05
问题 Referring the title, I'm figuring how to convert space between words to be %20 . For example, > y <- "I Love You" How to make y = I%20Love%20You > y [1] "I%20Love%20You" Thanks a lot. 回答1: gsub() is one option: R> gsub(pattern = " ", replacement = "%20", x = y) [1] "I%20Love%20You" 回答2: Another option would be URLencode() : y <- "I love you" URLencode(y) [1] "I%20love%20you" 回答3: The function curlEscape() from the package RCurl gets the job done. library('RCurl') y <- "I love you" curlEscape

Password SSH authentication method in RCurl

六眼飞鱼酱① 提交于 2019-12-03 22:58:01
问题 I'm using the ftpUpload function in the RCurl package to upload files to an sftp file server. I'm having difficulty working out the authentication call. Below is my call: ftpUpload(what = "some-file.png", to = "sftp://some-ftp-server.com:22/path/to/some-file.png", verbose = TRUE, userpwd = "my_userid:my_password") As a result I get: * About to connect() to some-ftp-server.com port 22 (#0) * Trying some-ftp-server.com... * connected * Connected to some-ftp-server.com (some ip address) port 22