Can anybody share a simple example of using Mathematica and Google scholar to extract academic research information

≡放荡痞女 提交于 2019-12-03 16:27:52

Google Scholar is not very suited for this goal as it doesn't have a formal API AFAIK. It also doesn't provide results in a structured (e.g. XML) format. So, we have to resort to a quick (and very, very fragile!) text pattern matching hack like:

 searchGoogleScholarAuthor[author_String] := 
 First[StringCases[
   Import["http://scholar.google.com/scholar?start=0&num=1&q=" <> 
     StringDrop[
      StringJoin @@ ("author:" <> # <> "+" & /@ 
         StringSplit[author]), -1] <> "&hl=en&as_sdt=1,5"], ___ ~~ 
     "Results" ~~ ___ ~~ "of about" ~~ Shortest[___] ~~ 
     p : Longest[(DigitCharacter | ",") ..] ~~ ___ ~~ "." ~~ ___ ~~ 
     "(" ~~ ___ :> p]]

In[191]:= searchGoogleScholarAuthor["A Einstein"]

Out[191]= "6,400"

In[190]:= searchGoogleScholarAuthor["Einstein"]

Out[190]= "9,400"

In[192]:= searchGoogleScholarAuthor["Wizard"]

Out[192]= "197"

In[193]:= searchGoogleScholarAuthor["Vries"]

Out[193]= "70,700"

Add ToExpression if you don't like the string result. If you want to restrict the publication years you can add &as_ylo=2011&as_yhi=2011& to the search string and change the start and end years appropriately.

Please note that authors with popular names will generate lots of spurious hits as there is no way to uniquely identify a single author. Additionally, Scholar returns a diversity of hits, including citations, books, reprints and more. So, really, this ain't very useful for counting.

A bit of explanation:

Scholar splits the initials and names of authors and co-authors over several author: fields combined with a +. The StringDrop[StringJoin @@ ("author:" <> # <> "+" & /@ StringSplit[author]), -1] part of the code takes care of that. The StringDrop removes the last +.

The Stringcases part contains a large text pattern which basically searches for the text that Scholar places at the top of each results page and which contains the number of hits. This number is then isolated and returned.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!