I have been asked to extract info by an academic colleague from a website where I need to link the content of a webpage in a table - not too hard with the contents of a text fil
This is somewhat tricky, and not fully integrated in R, but some system()-fiddling will get you started.
var page = new WebPage();
page.open('http://www.menne-biomed.de/uni/JavaButton.html', function (status) {
if (status !== 'success') {
console.log('Unable to access network');
} else {
var ua = page.evaluate(function () {
var t = document.getElementById('tk1').href;
var re = new RegExp('\((.*)\)');
return eval(re.exec(t)[1]);
});
console.log(ua);// Outputs http://cran.at.r-project.org/
}
phantom.exit();
});
With phantomjs on path, call
phantomjs javabutton.js
The link will be displayed on the console. Use any method to get it into Rcurl.
Not elegant, but maybe someones wraps phantomjs into R one day. In case the link to JaveButton.html should be lost, here it is as code.
<!DOCTYPE html >
<head>
<script>
inaccesibleJavascriptVar = 'http://' + 'cran.at.r-project.org/';
function doPostBack(myref)
{
window.location.href= myref;
return false;
}
</script>
</head>
<body>
<a id="tk1" href="javascript:doPostBack(inaccesibleJavascriptVar)" >Click here</a>
</body>
</html>
Have a look at the RCurl package:
http://www.omegahat.org/RCurl/