How to download search results on google scholar using r?

旧巷老猫 提交于 2019-12-03 12:18:35

问题


I would like extract the first 100 results (say) of a Google Scholar search using R. Does anyone know how to do it?

To be precise, I just need the name of the paper, authors and citation count.

Ps Would this be legal?


回答1:


please consider the updated biobucket-post:

http://thebiobucket.blogspot.com/2011/11/r-function-google-scholar-webscraper.html




回答2:


There are some Python and Perl scrapers out there that you might be able to adapt, linked at http://bmb-common.blogspot.com/2011/02/does-google-scholar-suck-or-am-i-just.html




回答3:


I can't speak to the legalities of your task, but there are a few ways you can go about this. While I am not strong in XPath, it might be the best way. I believe that you can use the XML package to retrieve the page contents and use XPath to extract the data of the elements you need.

For instance, I use Chrome for a browser, and when I inspected the page with Developer Tools, there does appear to be a structure to the page, with the data "hidden" inside various tags that should you be able to exploit really easily using XPath.

Check out this link for an example of using XPath.

HTH and Good Luck




回答4:


You can definitely retrieve the HTML content of the page using RCurl and parse them using RXML as suggested by Btibert3. The only issue you might face is that Google won't allow you to do queries in a "robotic" way. After something like 200 queries in Google in a short period of time, it won't return results anymore. Maybe that's different with Google Scholar, but I doubt so...




回答5:


A solution was recently published here:

http://thebiobucket.blogspot.com/2011/11/visually-examine-google-scholar-search.html



来源:https://stackoverflow.com/questions/5005989/how-to-download-search-results-on-google-scholar-using-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!