According to the V2 documentation, you can list all commits for a branch with:
commits/list/:user_id/:repository/:branch
I am not seeing th
I have encountered the exact same problem. I did manage to acquire all the commits for all branches within a repository (probably not that efficient due to the API).
As you mentioned, first you gather all the branches:
# https://api.github.com/repos/:user/:repo/branches
https://api.github.com/repos/twitter/bootstrap/branches
The key that you are missing is that APIv3 for getting commits operates using a reference commit (the parameter for the API call to list commits on a repository sha). So you need to make sure when you collect the branches that you also pick up their latest sha:
[
{
"commit": {
"url": "https://api.github.com/repos/twitter/bootstrap/commits/8b19016c3bec59acb74d95a50efce70af2117382",
"sha": "8b19016c3bec59acb74d95a50efce70af2117382"
},
"name": "gh-pages"
},
{
"commit": {
"url": "https://api.github.com/repos/twitter/bootstrap/commits/d335adf644b213a5ebc9cee3f37f781ad55194ef",
"sha": "d335adf644b213a5ebc9cee3f37f781ad55194ef"
},
"name": "master"
}
]
So as we see the two branches here have different sha, these are the latest commit sha on those branches. What you can do now is to iterate through each branch from their latest sha:
# With sha parameter of the branch's lastest sha
# https://api.github.com/repos/:user/:repo/commits
https://api.github.com/repos/twitter/bootstrap/commits?per_page=100&sha=d335adf644b213a5ebc9cee3f37f781ad55194ef
So the above API call will list the last 100 commits of the master branch of twitter/bootstrap. Working with the API you have to specify the next commit's sha to get the next 100 commits. We can use the last commit's sha (which is 7a8d6b19767a92b1c4ea45d88d4eedc2b29bf1fa using the current example) as input for the next API call:
# Next API call for commits (use the last commit's sha)
# https://api.github.com/repos/:user/:repo/commits
https://api.github.com/repos/twitter/bootstrap/commits?per_page=100&sha=7a8d6b19767a92b1c4ea45d88d4eedc2b29bf1fa
This process is repeated until the last commit's sha is the same as the API's call sha parameter.
That is it for one branch. Now you apply the same approach for the other branch (work from the latest sha).
There is a large issue with this approach... Since branches share some identical commits you will see the same commits over-and-over again as you move to another branch.
I can image that there is a much more efficient way to accomplish this, yet this worked for me.
I asked this same question for GitHub support, and they answered me this:
GETing /repos/:owner/:repo/commits should do the trick. You can pass the branch name in the
sha
parameter. For example, to get the first page of commits from the '3.0.0-wip' branch of the twitter/bootstrap repository, you would use the following curl request:curl https://api.github.com/repos/twitter/bootstrap/commits?sha=3.0.0-wip
The docs also describe how to use pagination to get the remaining commits for this branch.
As long as you are making authenticated requests, you can make up to 5,000 requests per hour.
I used the rails github-api in my app as follows(using https://github.com/peter-murach/github gem):
github_connection = Github.new :client_id => 'your_id', :client_secret => 'your_secret', :oauth_token => 'your_oath_token'
branches_info = {}
all_branches = git_connection.repos.list_branches owner,repo_name
all_branches.body.each do |branch|
branches_info["#{branch.name}".to_s] = "#{branch.commit.url}"
end
branches_info.keys.each do |branch|
commits_list.push (git_connection.repos.commits.list owner,repo_name, start_date, end_date, :sha => "branch_name")
end
You can use GraphQL API v4 to optimize commits download per branch. In the following method, I've managed to download in a single request 1900 commits (100 commits per branch in 19 different branches) which drastically reduces the number of requests (compared to using REST api).
You will have to get all branches & go through pagination if you have more than 100 branches :
Query :
query($owner:String!, $name:String!, $branchCursor: String!) {
repository(owner: $owner, name: $name) {
refs(first: 100, refPrefix: "refs/heads/",after: $branchCursor) {
totalCount
edges {
node {
name
target {
...on Commit {
history(first:0){
totalCount
}
}
}
}
}
pageInfo {
endCursor
hasNextPage
}
}
}
}
variables :
{
"owner": "google",
"name": "gson",
"branchCursor": ""
}
Try it in the explorer
Note that branchCursor
variable is used when you have more than 100 branches & features the value of pageInfo.endCursor
in the previous request in that case.
There is some limitation of the number of request per nodes that prevents us from making too much query per node. Here, some testing I've performed showed that we can't go over 19*100 commits in a single query.
Note that in case of repo which have < 19 branches, you don't need to bother about that
You can then create your query dynamically for getting the 100 next commits on all branches. An example with 2 branches :
query ($owner: String!, $name: String!) {
repository(owner: $owner, name: $name) {
branch0: ref(qualifiedName: "JsonArrayImplementsList") {
target {
... on Commit {
history(first: 100) {
...CommitFragment
}
}
}
}
branch1: ref(qualifiedName: "master") {
target {
... on Commit {
history(first: 100) {
...CommitFragment
}
}
}
}
}
}
fragment CommitFragment on CommitHistoryConnection {
totalCount
nodes {
oid
message
committedDate
author {
name
email
}
}
pageInfo {
hasNextPage
endCursor
}
}
Try it in the explorer
owner
for the repo's owner & name
for the name of the repo. You can see that pageInfo.hasNextpage
& pageInfo.endCursor
will be used to go through pagination for each branch. The pagination takes place in history(first: 100)
with specification of the last cursor encountered. For instance the next request will have history(first: 100, after: "6e2fcdcaf252c54a151ce6a4441280e4c54153ae 99")
. For each branch, we have to update the request with the last endCursor
value to query for the 100 next commit.
When pageInfo.hasNextPage
is false
, there is no more page for this branch, so we won't include it in the next request.
When the last branch have pageInfo.hasNextPage
to false
, we have retrieved all commits
Here is a sample implementation in NodeJS using github-graphql-client. The same method could be implemented in any other language. The following will also store commits in a file commitsX.json
:
var client = require('github-graphql-client');
var fs = require("fs");
const owner = "google";
const repo = "gson";
const accessToken = "YOUR_ACCESS_TOKEN";
const branchQuery = `
query($owner:String!, $name:String!, $branchCursor: String!) {
repository(owner: $owner, name: $name) {
refs(first: 100, refPrefix: "refs/heads/",after: $branchCursor) {
totalCount
edges {
node {
name
target {
...on Commit {
history(first:0){
totalCount
}
}
}
}
}
pageInfo {
endCursor
hasNextPage
}
}
}
}`;
function buildCommitQuery(branches){
var query = `
query ($owner: String!, $name: String!) {
repository(owner: $owner, name: $name) {`;
for (var key in branches) {
if (branches.hasOwnProperty(key) && branches[key].hasNextPage) {
query+=`
${key}: ref(qualifiedName: "${branches[key].name}") {
target {
... on Commit {
history(first: 100, after: ${branches[key].cursor ? '"' + branches[key].cursor + '"': null}) {
...CommitFragment
}
}
}
}`;
}
}
query+=`
}
}`;
query+= commitFragment;
return query;
}
const commitFragment = `
fragment CommitFragment on CommitHistoryConnection {
totalCount
nodes {
oid
message
committedDate
author {
name
email
}
}
pageInfo {
hasNextPage
endCursor
}
}`;
function doRequest(query, variables) {
return new Promise(function (resolve, reject) {
client({
token: accessToken,
query: query,
variables: variables
}, function (err, res) {
if (!err) {
resolve(res);
} else {
console.log(JSON.stringify(err, null, 2));
reject(err);
}
});
});
}
function buildBranchObject(branch){
var refs = {};
for (var i = 0; i < branch.length; i++) {
console.log("branch " + branch[i].node.name);
refs["branch" + i] = {
name: branch[i].node.name,
totalCount: branch[i].node.target.history.totalCount,
cursor: null,
hasNextPage : true,
commits: []
};
}
return refs;
}
async function requestGraphql() {
var iterateBranch = true;
var branches = [];
var cursor = "";
// get all branches
while (iterateBranch) {
let res = await doRequest(branchQuery,{
"owner": owner,
"name": repo,
"branchCursor": cursor
});
iterateBranch = res.data.repository.refs.pageInfo.hasNextPage;
cursor = res.data.repository.refs.pageInfo.endCursor;
branches = branches.concat(res.data.repository.refs.edges);
}
//split the branch array into smaller array of 19 items
var refChunk = [], size = 19;
while (branches.length > 0){
refChunk.push(branches.splice(0, size));
}
for (var j = 0; j < refChunk.length; j++) {
//1) store branches in a format that makes it easy to concat commit when receiving the query result
var refs = buildBranchObject(refChunk[j]);
//2) query commits while there are some pages existing. Note that branches that don't have pages are not
//added in subsequent request. When there are no more page, the loop exit
var hasNextPage = true;
var count = 0;
while (hasNextPage) {
var commitQuery = buildCommitQuery(refs);
console.log("request : " + count);
let commitResult = await doRequest(commitQuery, {
"owner": owner,
"name": repo
});
hasNextPage = false;
for (var key in refs) {
if (refs.hasOwnProperty(key) && commitResult.data.repository[key]) {
isEmpty = false;
let history = commitResult.data.repository[key].target.history;
refs[key].commits = refs[key].commits.concat(history.nodes);
refs[key].cursor = (history.pageInfo.hasNextPage) ? history.pageInfo.endCursor : '';
refs[key].hasNextPage = history.pageInfo.hasNextPage;
console.log(key + " : " + refs[key].commits.length + "/" + refs[key].totalCount + " : " + refs[key].hasNextPage + " : " + refs[key].cursor + " : " + refs[key].name);
if (refs[key].hasNextPage){
hasNextPage = true;
}
}
}
count++;
console.log("------------------------------------");
}
for (var key in refs) {
if (refs.hasOwnProperty(key)) {
console.log(refs[key].totalCount + " : " + refs[key].commits.length + " : " + refs[key].name);
}
}
//3) write commits chunk (up to 19 branches) in a single json file
fs.writeFile("commits" + j + ".json", JSON.stringify(refs, null, 4), "utf8", function(err){
if (err){
console.log(err);
}
console.log("done");
});
}
}
requestGraphql();
This also work with repo with a lot of branches, for instances this one which has more than 700 branches
Note that while it is true that with GraphQL you can perform a reduced number of requests, it won't necessarily improve your rate limit as the rate limit is based on points & not a limited number of requests : check GraphQL API rate limit
Pure JS Implementation without Access Token (Unauthorised Usage)
const base_url = 'https://api.github.com';
function httpGet(theUrl, return_headers) {
var xmlHttp = new XMLHttpRequest();
xmlHttp.open("GET", theUrl, false); // false for synchronous request
xmlHttp.send(null);
if (return_headers) {
return xmlHttp
}
return xmlHttp.responseText;
}
function get_all_commits_count(owner, repo, sha) {
let first_commit = get_first_commit(owner, repo);
let compare_url = base_url + '/repos/' + owner + '/' + repo + '/compare/' + first_commit + '...' + sha;
let commit_req = httpGet(compare_url);
let commit_count = JSON.parse(commit_req)['total_commits'] + 1;
console.log('Commit Count: ', commit_count);
return commit_count
}
function get_first_commit(owner, repo) {
let url = base_url + '/repos/' + owner + '/' + repo + '/commits';
let req = httpGet(url, true);
let first_commit_hash = '';
if (req.getResponseHeader('Link')) {
let page_url = req.getResponseHeader('Link').split(',')[1].split(';')[0].split('<')[1].split('>')[0];
let req_last_commit = httpGet(page_url);
let first_commit = JSON.parse(req_last_commit);
first_commit_hash = first_commit[first_commit.length - 1]['sha']
} else {
let first_commit = JSON.parse(req.responseText);
first_commit_hash = first_commit[first_commit.length - 1]['sha'];
}
return first_commit_hash;
}
let owner = 'getredash';
let repo = 'redash';
let sha = 'master';
get_all_commits_count(owner, repo, sha);
Credits - https://gist.github.com/yershalom/a7c08f9441d1aadb13777bce4c7cdc3b