Reading a github file using python returns HTML tags

后端 未结 5 1571
情书的邮戳
情书的邮戳 2020-12-18 07:40

I am trying to read a text file saved in github using requests package. Here is the python code I am using:

    import requests
    url = \'https://github.co         


        
相关标签:
5条回答
  • 2020-12-18 07:59

    You could first clone the repo, either via bash, or using a python library like GitPython. Then just open and read the file locally.

    0 讨论(0)
  • 2020-12-18 08:03

    Thank you @dasdachs for your answer. However I was getting an error when executing the following line:

    content = base64.decodestring(req['content'])
    

    The error I got was:

    /usr/lib/python3.6/base64.py in _input_type_check(s)
        511     except TypeError as err:
        512         msg = "expected bytes-like object, not %s" % s.__class__.__name__
    --> 513         raise TypeError(msg) from err
        514     if m.format not in ('c', 'b', 'B'):
        515         msg = ("expected single byte elements, not %r from %s" %
    
    TypeError: expected bytes-like object, not str
    

    Hence I replaced it with the below snippet:

    content = base64.b64decode(json['content'])
    

    Sharing my working snippet below (executing in Python 3):

    import requests
    import base64
    import json
    
    
    def constructURL(user = "404",repo_name= "404",path_to_file= "404",url= "404"):
      url = url.replace("{user}",user)
      url = url.replace("{repo_name}",repo_name)
      url = url.replace("{path_to_file}",path_to_file)
      return url
    
    user = '<provide value>'
    repo_name = '<provide value>'
    path_to_file = '<provide value>'
    json_url ='https://api.github.com/repos/{user}/{repo_name}/contents/{path_to_file}'
    
    json_url = constructURL(user,repo_name,path_to_file,json_url) #forms the correct URL
    response = requests.get(json_url) #get data from json file located at specified URL 
    
    if response.status_code == requests.codes.ok:
        jsonResponse = response.json()  # the response is a JSON
        #the JSON is encoded in base 64, hence decode it
        content = base64.b64decode(jsonResponse['content'])
        #convert the byte stream to string
        jsonString = content.decode('utf-8')
        finalJson = json.loads(jsonString)
    else:
        print('Content was not found.')
    
    for key, value in finalJson.items():
        print("The key and value are ({}) = ({})".format(key, value))
    
    0 讨论(0)
  • 2020-12-18 08:06

    Expanding on @Patrick's answer, I'm going to show you my code for how to do that.

    import requests
    url = 'https://raw.githubusercontent.com/...'
    page = requests.get(url)
    print page.text
    
    0 讨论(0)
  • 2020-12-18 08:14

    You can access a text version by changing the beginning of your link to

    https://raw.githubusercontent.com/
    
    0 讨论(0)
  • 2020-12-18 08:16

    There are some good solutions already, but if you use requests just follow Github's API.

    The endpoint for all content is

    GET /repos/:owner/:repo/contents/:path
    

    But keep in mind that the default behavior of Github's API is to encode the content using base64.

    In your case you would do the following:

    #!/usr/bin/env python3
    import base64
    import requests
    
    
    url = 'https://api.github.com/repos/{user}/{repo_name}/contents/{path_to_file}'
    req = requests.get(url)
    if req.status_code == requests.codes.ok:
        req = req.json()  # the response is a JSON
        # req is now a dict with keys: name, encoding, url, size ...
        # and content. But it is encoded with base64.
        content = base64.decodestring(req['content'])
    else:
        print('Content was not found.')
    
    0 讨论(0)
提交回复
热议问题