Github API: Retrieve all commits for all branches for a repo

后端 未结 4 1277
时光取名叫无心
时光取名叫无心 2020-11-30 20:12

According to the V2 documentation, you can list all commits for a branch with:

commits/list/:user_id/:repository/:branch

I am not seeing th

相关标签:
4条回答
  • 2020-11-30 20:57

    I have encountered the exact same problem. I did manage to acquire all the commits for all branches within a repository (probably not that efficient due to the API).

    Approach to retrieve all commits for all branches in a repository

    As you mentioned, first you gather all the branches:

    # https://api.github.com/repos/:user/:repo/branches
    https://api.github.com/repos/twitter/bootstrap/branches
    

    The key that you are missing is that APIv3 for getting commits operates using a reference commit (the parameter for the API call to list commits on a repository sha). So you need to make sure when you collect the branches that you also pick up their latest sha:

    Trimmed result of branch API call for twitter/bootstrap

    [
      {
        "commit": {
          "url": "https://api.github.com/repos/twitter/bootstrap/commits/8b19016c3bec59acb74d95a50efce70af2117382",
          "sha": "8b19016c3bec59acb74d95a50efce70af2117382"
        },
        "name": "gh-pages"
      },
      {
        "commit": {
          "url": "https://api.github.com/repos/twitter/bootstrap/commits/d335adf644b213a5ebc9cee3f37f781ad55194ef",
          "sha": "d335adf644b213a5ebc9cee3f37f781ad55194ef"
        },
        "name": "master"
      }
    ]
    

    Working with last commit's sha

    So as we see the two branches here have different sha, these are the latest commit sha on those branches. What you can do now is to iterate through each branch from their latest sha:

    # With sha parameter of the branch's lastest sha
    # https://api.github.com/repos/:user/:repo/commits
    https://api.github.com/repos/twitter/bootstrap/commits?per_page=100&sha=d335adf644b213a5ebc9cee3f37f781ad55194ef
    

    So the above API call will list the last 100 commits of the master branch of twitter/bootstrap. Working with the API you have to specify the next commit's sha to get the next 100 commits. We can use the last commit's sha (which is 7a8d6b19767a92b1c4ea45d88d4eedc2b29bf1fa using the current example) as input for the next API call:

    # Next API call for commits (use the last commit's sha)
    # https://api.github.com/repos/:user/:repo/commits
    https://api.github.com/repos/twitter/bootstrap/commits?per_page=100&sha=7a8d6b19767a92b1c4ea45d88d4eedc2b29bf1fa
    

    This process is repeated until the last commit's sha is the same as the API's call sha parameter.

    Next branch

    That is it for one branch. Now you apply the same approach for the other branch (work from the latest sha).


    There is a large issue with this approach... Since branches share some identical commits you will see the same commits over-and-over again as you move to another branch.

    I can image that there is a much more efficient way to accomplish this, yet this worked for me.

    0 讨论(0)
  • 2020-11-30 20:57

    I asked this same question for GitHub support, and they answered me this:

    GETing /repos/:owner/:repo/commits should do the trick. You can pass the branch name in the sha parameter. For example, to get the first page of commits from the '3.0.0-wip' branch of the twitter/bootstrap repository, you would use the following curl request:

    curl https://api.github.com/repos/twitter/bootstrap/commits?sha=3.0.0-wip
    

    The docs also describe how to use pagination to get the remaining commits for this branch.

    As long as you are making authenticated requests, you can make up to 5,000 requests per hour.

    I used the rails github-api in my app as follows(using https://github.com/peter-murach/github gem):

    github_connection = Github.new :client_id => 'your_id', :client_secret => 'your_secret', :oauth_token => 'your_oath_token'
    branches_info = {}
    all_branches = git_connection.repos.list_branches owner,repo_name
    all_branches.body.each do |branch|
        branches_info["#{branch.name}".to_s] = "#{branch.commit.url}"
    end
    branches_info.keys.each do |branch|
        commits_list.push (git_connection.repos.commits.list owner,repo_name, start_date,      end_date, :sha => "branch_name")
    end
    
    0 讨论(0)
  • 2020-11-30 21:01

    Using GraphQL API v4

    You can use GraphQL API v4 to optimize commits download per branch. In the following method, I've managed to download in a single request 1900 commits (100 commits per branch in 19 different branches) which drastically reduces the number of requests (compared to using REST api).

    1 - Get all branches

    You will have to get all branches & go through pagination if you have more than 100 branches :

    Query :

    query($owner:String!, $name:String!, $branchCursor: String!) {
      repository(owner: $owner, name: $name) {
        refs(first: 100, refPrefix: "refs/heads/",after: $branchCursor) {
          totalCount
          edges {
            node {
              name
              target {
                ...on Commit {
                  history(first:0){
                    totalCount
                  }
                }
              }
            }
          }
          pageInfo {
            endCursor
            hasNextPage
          }
        }
      }
    }
    

    variables :

    {
      "owner": "google",
      "name": "gson",
      "branchCursor": ""
    }
    

    Try it in the explorer

    Note that branchCursor variable is used when you have more than 100 branches & features the value of pageInfo.endCursor in the previous request in that case.

    2 - Split the branches array into array of 19 branches max

    There is some limitation of the number of request per nodes that prevents us from making too much query per node. Here, some testing I've performed showed that we can't go over 19*100 commits in a single query.

    Note that in case of repo which have < 19 branches, you don't need to bother about that

    3 - Query commits by chunk of 100 for each branch

    You can then create your query dynamically for getting the 100 next commits on all branches. An example with 2 branches :

    query ($owner: String!, $name: String!) {
      repository(owner: $owner, name: $name) {
        branch0: ref(qualifiedName: "JsonArrayImplementsList") {
          target {
            ... on Commit {
              history(first: 100) {
                ...CommitFragment
              }
            }
          }
        }
        branch1: ref(qualifiedName: "master") {
          target {
            ... on Commit {
              history(first: 100) {
                ...CommitFragment
              }
            }
          }
        }
      }
    }
    
    fragment CommitFragment on CommitHistoryConnection {
      totalCount
      nodes {
        oid
        message
        committedDate
        author {
          name
          email
        }
      }
      pageInfo {
        hasNextPage
        endCursor
      }
    }
    

    Try it in the explorer

    • The variables used are owner for the repo's owner & name for the name of the repo.
    • A fragment in order to avoid duplication of commit history field definition.

    You can see that pageInfo.hasNextpage & pageInfo.endCursor will be used to go through pagination for each branch. The pagination takes place in history(first: 100) with specification of the last cursor encountered. For instance the next request will have history(first: 100, after: "6e2fcdcaf252c54a151ce6a4441280e4c54153ae 99"). For each branch, we have to update the request with the last endCursor value to query for the 100 next commit.

    When pageInfo.hasNextPage is false, there is no more page for this branch, so we won't include it in the next request.

    When the last branch have pageInfo.hasNextPage to false, we have retrieved all commits

    Sample implementation

    Here is a sample implementation in NodeJS using github-graphql-client. The same method could be implemented in any other language. The following will also store commits in a file commitsX.json :

    var client = require('github-graphql-client');
    var fs = require("fs");
    
    const owner = "google";
    const repo = "gson";
    const accessToken = "YOUR_ACCESS_TOKEN";
    
    const branchQuery = `
    query($owner:String!, $name:String!, $branchCursor: String!) {
      repository(owner: $owner, name: $name) {
        refs(first: 100, refPrefix: "refs/heads/",after: $branchCursor) {
          totalCount
          edges {
            node {
              name
              target {
                ...on Commit {
                  history(first:0){
                    totalCount
                  }
                }
              }
            }
          }
          pageInfo {
            endCursor
            hasNextPage
          }
        }
      }
    }`;
    
    function buildCommitQuery(branches){
        var query = `
            query ($owner: String!, $name: String!) {
              repository(owner: $owner, name: $name) {`;
        for (var key in branches) {
            if (branches.hasOwnProperty(key) && branches[key].hasNextPage) {
              query+=`
                ${key}: ref(qualifiedName: "${branches[key].name}") {
                  target {
                    ... on Commit {
                      history(first: 100, after: ${branches[key].cursor ? '"' + branches[key].cursor + '"': null}) {
                        ...CommitFragment
                      }
                    }
                  }
                }`;
            }
        }
        query+=`
              }
            }`;
        query+= commitFragment;
        return query;
    }
    
    const commitFragment = `
    fragment CommitFragment on CommitHistoryConnection {
      totalCount
      nodes {
        oid
        message
        committedDate
        author {
          name
          email
        }
      }
      pageInfo {
        hasNextPage
        endCursor
      }
    }`;
    
    function doRequest(query, variables) {
      return new Promise(function (resolve, reject) {
        client({
            token: accessToken,
            query: query,
            variables: variables
        }, function (err, res) {
          if (!err) {
            resolve(res);
          } else {
            console.log(JSON.stringify(err, null, 2));
            reject(err);
          }
        });
      });
    }
    
    function buildBranchObject(branch){
        var refs = {};
    
        for (var i = 0; i < branch.length; i++) {
            console.log("branch " + branch[i].node.name);
            refs["branch" + i] = {
                name: branch[i].node.name,
                totalCount: branch[i].node.target.history.totalCount,
                cursor: null,
                hasNextPage : true,
                commits: []
            };
        }
        return refs;
    }
    
    async function requestGraphql() {
        var iterateBranch = true;
        var branches = [];
        var cursor = "";
    
        // get all branches
        while (iterateBranch) {
            let res = await doRequest(branchQuery,{
              "owner": owner,
              "name": repo,
              "branchCursor": cursor
            });
            iterateBranch = res.data.repository.refs.pageInfo.hasNextPage;
            cursor = res.data.repository.refs.pageInfo.endCursor;
            branches = branches.concat(res.data.repository.refs.edges);
        }
    
        //split the branch array into smaller array of 19 items
        var refChunk = [], size = 19;
    
        while (branches.length > 0){
            refChunk.push(branches.splice(0, size));
        }
    
        for (var j = 0; j < refChunk.length; j++) {
    
            //1) store branches in a format that makes it easy to concat commit when receiving the query result
            var refs = buildBranchObject(refChunk[j]);
    
            //2) query commits while there are some pages existing. Note that branches that don't have pages are not 
            //added in subsequent request. When there are no more page, the loop exit
            var hasNextPage = true;
            var count = 0;
    
            while (hasNextPage) {
                var commitQuery = buildCommitQuery(refs);
                console.log("request : " + count);
                let commitResult = await doRequest(commitQuery, {
                  "owner": owner,
                  "name": repo
                });
                hasNextPage = false;
                for (var key in refs) {
                    if (refs.hasOwnProperty(key) && commitResult.data.repository[key]) {
                        isEmpty = false;
                        let history = commitResult.data.repository[key].target.history;
                        refs[key].commits = refs[key].commits.concat(history.nodes);
                        refs[key].cursor = (history.pageInfo.hasNextPage) ? history.pageInfo.endCursor : '';
                        refs[key].hasNextPage = history.pageInfo.hasNextPage;
                        console.log(key + " : " + refs[key].commits.length + "/" + refs[key].totalCount + " : " + refs[key].hasNextPage + " : " + refs[key].cursor + " : " + refs[key].name);
                        if (refs[key].hasNextPage){
                            hasNextPage = true;
                        }
                    }
                }
                count++;
                console.log("------------------------------------");
            }
            for (var key in refs) {
                if (refs.hasOwnProperty(key)) {
                    console.log(refs[key].totalCount + " : " + refs[key].commits.length + " : " + refs[key].name);
                }
            }
    
            //3) write commits chunk (up to 19 branches) in a single json file
            fs.writeFile("commits" + j + ".json", JSON.stringify(refs, null, 4), "utf8", function(err){
                if (err){
                    console.log(err);
                }
                console.log("done");
            });
        }
    }
    
    requestGraphql();
    

    This also work with repo with a lot of branches, for instances this one which has more than 700 branches

    Rate Limit

    Note that while it is true that with GraphQL you can perform a reduced number of requests, it won't necessarily improve your rate limit as the rate limit is based on points & not a limited number of requests : check GraphQL API rate limit

    0 讨论(0)
  • 2020-11-30 21:04

    Pure JS Implementation without Access Token (Unauthorised Usage)

    const base_url = 'https://api.github.com';
    
        function httpGet(theUrl, return_headers) {
            var xmlHttp = new XMLHttpRequest();
            xmlHttp.open("GET", theUrl, false); // false for synchronous request
            xmlHttp.send(null);
            if (return_headers) {
                return xmlHttp
            }
            return xmlHttp.responseText;
        }
    
        function get_all_commits_count(owner, repo, sha) {
            let first_commit = get_first_commit(owner, repo);
            let compare_url = base_url + '/repos/' + owner + '/' + repo + '/compare/' + first_commit + '...' + sha;
            let commit_req = httpGet(compare_url);
            let commit_count = JSON.parse(commit_req)['total_commits'] + 1;
            console.log('Commit Count: ', commit_count);
            return commit_count
        }
    
        function get_first_commit(owner, repo) {
            let url = base_url + '/repos/' + owner + '/' + repo + '/commits';
            let req = httpGet(url, true);
            let first_commit_hash = '';
            if (req.getResponseHeader('Link')) {
                let page_url = req.getResponseHeader('Link').split(',')[1].split(';')[0].split('<')[1].split('>')[0];
                let req_last_commit = httpGet(page_url);
                let first_commit = JSON.parse(req_last_commit);
                first_commit_hash = first_commit[first_commit.length - 1]['sha']
            } else {
                let first_commit = JSON.parse(req.responseText);
                first_commit_hash = first_commit[first_commit.length - 1]['sha'];
            }
            return first_commit_hash;
        }
    
        let owner = 'getredash';
        let repo = 'redash';
        let sha = 'master';
        get_all_commits_count(owner, repo, sha);
    

    Credits - https://gist.github.com/yershalom/a7c08f9441d1aadb13777bce4c7cdc3b

    0 讨论(0)
提交回复
热议问题