Google Scholar profile scrape PHP

一个人想着一个人 提交于 2019-12-11 10:28:12

问题


I would like to scrap publications from google scholar profile with SimpleHtmlDom.

I have script for scraping the projects, but the problem is, that i am able to scrap only projects, that are shown.
When i am using url like this

$html->load_file("http://scholar.google.se/citations?user=Sx4G9YgAAAAJ");

there are shown only 20 projects. I can increase the number when i change the url

$html->load_file("https://scholar.google.se/citations?user=Sx4G9YgAAAAJ&hl=&view_op=list_works&pagesize=100");

by set the "pagesize" attribute. But the problem is, that 100 is maximum number of publications, what is webpage able to show. Is there some way how to scrap all the projects from profile?


回答1:


You cannot get all of the projects at once but you can get 100 projects at a time then get another 100 and so on, here is the URL

https://scholar.google.com/citations?user=Sx4G9YgAAAAJ&hl=&view_op=list_works&cstart=100&pagesize=100

In the above URL focus on cstart attribute, let's say you already grabbed 100 projects so now you will enter cstart=100 and grab another 100 list and then cstart=200 and so on until you get all of the publications.

Hope this helps



来源:https://stackoverflow.com/questions/49283574/google-scholar-profile-scrape-php

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!