How to get data from all pages in Github API with Python?

前端 未结 6 2047
借酒劲吻你
借酒劲吻你 2021-02-05 17:55

I\'m trying to export a repo list and it always returns me information about the 1rst page. I could extend the number of items per page using URL+\"?per_page=100\" but it\'s not

相关标签:
6条回答
  • 2021-02-05 18:34

    From github docs:

    Response:

    Status: 200 OK
    Link: <https://api.github.com/resource?page=2>; rel="next",
          <https://api.github.com/resource?page=5>; rel="last"
    X-RateLimit-Limit: 5000
    X-RateLimit-Remaining: 4999
    

    You get the links to the next and the last page of that organization. Just check the headers.

    On Python Requests, you can access your headers with:

    response.headers
    

    It is a dictionary containing the response headers. If link is present, then there are more pages and it will contain related information. It is recommended to traverse using those links instead of building your own.

    You can try something like this:

    import requests
    url = 'https://api.github.com/orgs/xxxxxxx/repos?page{0}&per_page=100'
    response = requests.get(url)
    link = response.headers.get('link', None)
    if link is not None:
        print link
    

    If link is not None it will be a string containing the relevant links for your resource.

    0 讨论(0)
  • 2021-02-05 18:37

    Extending on the answers above, here is a recursive function to deal with the GitHub pagination that will iterate through all pages, concatenating the list with each recursive call and finally returning the complete list when there are no more pages to retrieve, unless the optional failsafe returns the list when there are more than 500 items.

    import requests
    
    api_get_users = 'https://api.github.com/users'
    
    
    def call_api(apicall, **kwargs):
    
        data = kwargs.get('page', [])
    
        resp = requests.get(apicall)
        data += resp.json()
    
        # failsafe
        if len(data) > 500:
            return (data)
    
        if 'next' in resp.links.keys():
            return (call_api(resp.links['next']['url'], page=data))
    
        return (data)
    
    
    data = call_api(api_get_users)
    
    0 讨论(0)
  • 2021-02-05 18:42

    From my understanding, link will be None if only a single page of data is returned, otherwise link will be present even when going beyond the last page. In this case link will contain previous and first links.

    Here is some sample python which aims to simply return the link for the next page, and returns None if there is no next page. So could incorporate in a loop.

    link = r.headers['link']
    if link is None:
        return None
    
    # Should be a comma separated string of links
    links = link.split(',')
    
    for link in links:
        # If there is a 'next' link return the URL between the angle brackets, or None
        if 'rel="next"' in link:
            return link[link.find("<")+1:link.find(">")]
    return None
    
    0 讨论(0)
  • 2021-02-05 18:45
    import requests
    
    url = "https://api.github.com/XXXX?simple=yes&per_page=100&page=1"
    res=requests.get(url,headers={"Authorization": git_token})
    repos=res.json()
    while 'next' in res.links.keys():
      res=requests.get(res.links['next']['url'],headers={"Authorization": git_token})
      repos.extend(res.json())
    

    If you aren't making a full blown app use a "Personal Access Token"

    https://github.com/settings/tokens

    0 讨论(0)
  • 2021-02-05 18:48

    First you use

    print(a.headers.get('link'))
    

    this will give you the number of pages the repository has, similar to below

    <https://api.github.com/organizations/xxxx/repos?page=2&type=all>; rel="next", 
    
    <https://api.github.com/organizations/xxxx/repos?page=8&type=all>; rel="last"
    

    from this you can see that currently we are on first page of repo, rel='next' says that the next page is 2, and rel='last' tells us that your last page is 8.

    After knowing the number of pages to traverse through,you just need to use '=' for page number while getting request and change the while loop until the last page number, not len(repo) as it will return you 100 each time. for e.g

    i=1
    while i <= 8:
          r = requests.get('https://api.github.com/orgs/xxxx/repos?page={0}&type=all'.format(i),
                             auth=('My_user', 'My_passwd'))
          repo = r.json()
          for j in repo:
            print(repo[j][u'full_name'])
          i = i + 1
    
    0 讨论(0)
  • 2021-02-05 18:52
            link = res.headers.get('link', None)
    
            if link is not None:
                link_next = [l for l in link.split(',') if 'rel="next"' in l]
                if len(link_next) > 0:
                    return int(link_next[0][link_next[0].find("page=")+5:link_next[0].find(">")])
    
    0 讨论(0)
提交回复
热议问题