问题
I am trying to understand the evolution of the 100 largest repositories on GitHub. I can easily access the 100 largest repositories as of today (as measured per total number of contributors, stars, forks or LOC) using the GitHub search function or GithubArchive.org.
However, I would like to look at the 100 largest repositories at a given data in history (say, 1st of April 2011), so that I can track their growth (or decline) from that point on. How can I identify the 100 largest repositories on GitHub (as measured per stars, forks, or LOC) for a date in the past?
回答1:
I think the GitHub archive project can be of help: http://www.githubarchive.org/
It stores all the public events from the GitHub timeline and exposes them for processing. The events contain info about the repositories, so you should be able to pull the data out of there to fit your use-case.
For example, I've just used the following query in the BigQuery console ( https://bigquery.cloud.google.com/?pli=1 ) to find out the number of forks of the joyent/node repository for the date 2012-03-15:
SELECT repository_forks, created_at FROM [publicdata:samples.github_timeline] WHERE (repository_url = "https://github.com/joyent/node") AND (created_at CONTAINS "2012-03-15") LIMIT 1
At here are the results:
Row forks created_at
1 1579 2012-03-15 07:49:54
Obiously, you would use the BigQuery API to do something similar (extract the data you want, fetch data for a range of dates, etc.).
And here is a query for fetching the single largest repository (by forks) for a given date:
SELECT repository_forks, repository_url FROM [publicdata:samples.github_timeline] WHERE (created_at CONTAINS "2012-03-15") ORDER BY repository_forks DESC LIMIT 1
Result:
Row forks repository_url
1 6341 https://github.com/octocat/Spoon-Knife
And here is the query to fetch the top 100 repositories by forks for a given date:
SELECT MAX(repository_forks) as forks, repository_url FROM [publicdata:samples.github_timeline] WHERE (created_at CONTAINS "2012-03-15") GROUP BY repository_url ORDER BY forks DESC LIMIT 100
Result:
Row forks repository_url
1 6341 https://github.com/octocat/Spoon-Knife
2 4452 https://github.com/twitter/bootstrap
3 3647 https://github.com/mxcl/homebrew
4 2888 https://github.com/rails/rails
...
来源:https://stackoverflow.com/questions/13745285/how-to-find-the-100-largest-github-repositories-for-a-past-date