As a way of exploring how to make a package in R for the Denver RUG, I decided that it would be a fun little project to write an R wrapper around the datasciencetoolkit API.
I just wanted to point out that there must be an issue with passing a raw string via the postForm function. For example, if I use curl from the command line, I get the following:
$ curl -d "Archbishop Huxley" "http://www.datasciencetoolkit.org/text2people
[{"gender":"u","first_name":"","title":"archbishop","surnames":"Huxley","start_index":0,"end_index":17,"matched_string":"Archbishop Huxley"}]
and in R I get
> api <- "http://www.datasciencetoolkit.org/text2people"
> postForm(api, a="Archbishop Huxley")
[1] "[{\"gender\":\"u\",\"first_name\":\"\",\"title\":\"archbishop\",\"surnames\":\"Huxley\",\"start_index\":44,\"end_index\":61,\"matched_string\":\"Archbishop Huxley\"},{\"gender\":\"u\",\"first_name\":\"\",\"title\":\"archbishop\",\"surnames\":\"Huxley\",\"start_index\":88,\"end_index\":105,\"matched_string\":\"Archbishop Huxley\"}]"
attr(,"Content-Type")
charset
"text/html" "utf-8"
Note that it returns two elements in the JSON string and neither one matches on the start_index or end_index. Is this a problem with encoding or something?
With httr, this is just:
library(httr)
r <- POST("http://www.datasciencetoolkit.org/text2people",
body = "Tim O'Reilly, Archbishop Huxley")
stop_for_status(r)
content(r, "parsed", "application/json")
From Duncan Temple Lang on the R-help list:
postForm() is using a different style (or specifically Content-Type) of submitting the form than the curl -d command. Switching the style = 'POST' uses the same type, but at a quick guess, the parameter name 'a' is causing confusion and the result is the empty JSON array - "[]".
A quick workaround is to use curlPerform() directly rather than postForm()
r = dynCurlReader()
curlPerform(postfields = 'Archbishop Huxley', url = 'http://www.datasciencetoolkit.org/text2people', verbose = TRUE,
post = 1L, writefunction = r$update)
r$value()
This yields
[1]
"[{\"gender\":\"u\",\"first_name\":\"\",\"title\":\"archbishop\",\"surnames\":\"Huxley\",\"start_index\":0,\"end_index\":17,\"matched_string\":\"Archbishop
Huxley\"}]"
and you can use fromJSON() to transform it into data in R.
Generally, in those cases where you're trying to POST something that isn't keyed, you can just assign a dummy key to that value. For example:
> postForm("http://www.datasciencetoolkit.org/text2people", a="Archbishop Huxley")
[1] "[{\"gender\":\"u\",\"first_name\":\"\",\"title\":\"archbishop\",\"surnames\":\"Huxley\",\"start_index\":44,\"end_index\":61,\"matched_string\":\"Archbishop Huxley\"},{\"gender\":\"u\",\"first_name\":\"\",\"title\":\"archbishop\",\"surnames\":\"Huxley\",\"start_index\":88,\"end_index\":105,\"matched_string\":\"Archbishop Huxley\"}]"
attr(,"Content-Type")
charset
"text/html" "utf-8"
Would work the same if I'd used b="Archbishop Huxley", etc.
Enjoy RCurl - it's probably my favorite R package. If you get adventurous, upgrading to ~ libcurl 7.21 exposes some new methods via curl (including SMTP, etc.).
The simplePostToHost function in the httpRequest package might do what you are looking for here.