问题
I would like to scrap publications from google scholar profile with SimpleHtmlDom.
I have script for scraping the projects, but the problem is, that i am able to scrap only projects, that are shown.
When i am using url like this
$html->load_file("http://scholar.google.se/citations?user=Sx4G9YgAAAAJ");
there are shown only 20 projects. I can increase the number when i change the url
$html->load_file("https://scholar.google.se/citations?user=Sx4G9YgAAAAJ&hl=&view_op=list_works&pagesize=100");
by set the "pagesize" attribute. But the problem is, that 100 is maximum number of publications, what is webpage able to show. Is there some way how to scrap all the projects from profile?
回答1:
You cannot get all of the projects at once but you can get 100 projects at a time then get another 100 and so on, here is the URL
https://scholar.google.com/citations?user=Sx4G9YgAAAAJ&hl=&view_op=list_works&cstart=100&pagesize=100
In the above URL focus on cstart attribute, let's say you already grabbed 100 projects so now you will enter cstart=100
and grab another 100 list and then cstart=200
and so on until you get all of the publications.
Hope this helps
来源:https://stackoverflow.com/questions/49283574/google-scholar-profile-scrape-php