How to find the 100 largest GitHub repositories for a past date?

余生颓废 提交于 2020-01-23 05:13:00

问题


I am trying to understand the evolution of the 100 largest repositories on GitHub. I can easily access the 100 largest repositories as of today (as measured per total number of contributors, stars, forks or LOC) using the GitHub search function or GithubArchive.org.

However, I would like to look at the 100 largest repositories at a given data in history (say, 1st of April 2011), so that I can track their growth (or decline) from that point on. How can I identify the 100 largest repositories on GitHub (as measured per stars, forks, or LOC) for a date in the past?


回答1:


I think the GitHub archive project can be of help: http://www.githubarchive.org/

It stores all the public events from the GitHub timeline and exposes them for processing. The events contain info about the repositories, so you should be able to pull the data out of there to fit your use-case.

For example, I've just used the following query in the BigQuery console ( https://bigquery.cloud.google.com/?pli=1 ) to find out the number of forks of the joyent/node repository for the date 2012-03-15:

SELECT repository_forks, created_at FROM [publicdata:samples.github_timeline] WHERE (repository_url = "https://github.com/joyent/node") AND (created_at CONTAINS "2012-03-15") LIMIT 1

At here are the results:

Row forks   created_at   
1   1579    2012-03-15 07:49:54  

Obiously, you would use the BigQuery API to do something similar (extract the data you want, fetch data for a range of dates, etc.).

And here is a query for fetching the single largest repository (by forks) for a given date:

SELECT repository_forks, repository_url FROM [publicdata:samples.github_timeline] WHERE (created_at CONTAINS "2012-03-15") ORDER BY repository_forks DESC LIMIT 1

Result:

Row forks   repository_url   
1   6341    https://github.com/octocat/Spoon-Knife   

And here is the query to fetch the top 100 repositories by forks for a given date:

SELECT MAX(repository_forks) as forks, repository_url FROM [publicdata:samples.github_timeline] WHERE (created_at CONTAINS "2012-03-15") GROUP BY repository_url ORDER BY forks DESC LIMIT 100

Result:

Row forks   repository_url   
1   6341    https://github.com/octocat/Spoon-Knife   
2   4452    https://github.com/twitter/bootstrap     
3   3647    https://github.com/mxcl/homebrew     
4   2888    https://github.com/rails/rails
...


来源:https://stackoverflow.com/questions/13745285/how-to-find-the-100-largest-github-repositories-for-a-past-date

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!