I want to fetch all the comments on CNN whose comment system is Disqus. As an example, http://edition.cnn.com/2013/02/25/tech/innovation/google-glass-privacy-andrew-keen/index.h
Just an addition: to get the url of disqus comments on any page that it's found, run this JavaScript code in the web browser console:
var visit = function () {
var url = document.querySelector('div#disqus_thread iframe').src;
String.prototype.startsWith = function (check) {
return(this.indexOf(check) == 0);
};
if (!url.startsWith('https://')) return url.slice(0, 4) + "s" + url.slice(4);
return url;
}();
Since the variable is now in 'visit'
console.log(visit);
I helped you to mine all the data into a UTF-8 json format, saved it into .txt and it can be found at this link. The json format contains some variable names but the one you need is the 'data' variable, which is a JavaScript array.
Iterate through each of them and then split them at 'x==x'. The 'x==x' was done to make sure that the userid of those who made the comments where captured too. In a situation where there is no userid in number format but a name, it means that the account is no longer active.
To use the userid, it's a matter of https://disqus.com/users/106222183 where the 106222183 is the userid