Reading a github file using python returns HTML tags

后端未结

关注

 5  1575

I am trying to read a text file saved in github using requests package. Here is the python code I am using:

    import requests
    url = \'https://github.co


                      
              相关标签:


      
      
        
          5条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  一个人的身影        
                
              
                            
                2020-12-18 07:59
              
            
            
                                                                       
You could first clone the repo, either via bash, or using a python library like GitPython. Then just open and read the file locally. 
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  佛祖请我去吃肉        
                
              
                            
                2020-12-18 08:03
              
            
            
                                                                       
Thank you @dasdachs for your answer. However I was getting an error when executing the following line:

content = base64.decodestring(req['content'])


The error I got was:

/usr/lib/python3.6/base64.py in _input_type_check(s)
    511     except TypeError as err:
    512         msg = "expected bytes-like object, not %s" % s.__class__.__name__
--> 513         raise TypeError(msg) from err
    514     if m.format not in ('c', 'b', 'B'):
    515         msg = ("expected single byte elements, not %r from %s" %

TypeError: expected bytes-like object, not str


Hence I replaced it with the below snippet:

content = base64.b64decode(json['content'])


Sharing my working snippet below (executing in Python 3):

import requests
import base64
import json


def constructURL(user = "404",repo_name= "404",path_to_file= "404",url= "404"):
  url = url.replace("{user}",user)
  url = url.replace("{repo_name}",repo_name)
  url = url.replace("{path_to_file}",path_to_file)
  return url

user = '<provide value>'
repo_name = '<provide value>'
path_to_file = '<provide value>'
json_url ='https://api.github.com/repos/{user}/{repo_name}/contents/{path_to_file}'

json_url = constructURL(user,repo_name,path_to_file,json_url) #forms the correct URL
response = requests.get(json_url) #get data from json file located at specified URL 

if response.status_code == requests.codes.ok:
    jsonResponse = response.json()  # the response is a JSON
    #the JSON is encoded in base 64, hence decode it
    content = base64.b64decode(jsonResponse['content'])
    #convert the byte stream to string
    jsonString = content.decode('utf-8')
    finalJson = json.loads(jsonString)
else:
    print('Content was not found.')

for key, value in finalJson.items():
    print("The key and value are ({}) = ({})".format(key, value))

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  梦毁少年i        
                
              
                            
                2020-12-18 08:06
              
            
            
                                                                       
Expanding on @Patrick's answer, I'm going to show you my code for how to do that.

import requests
url = 'https://raw.githubusercontent.com/...'
page = requests.get(url)
print page.text

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  慢半拍i        
                
              
                            
                2020-12-18 08:14
              
            
            
                                                                       
You can access a text version by changing the beginning of your link to 

https://raw.githubusercontent.com/

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  刺人心        
                
              
                            
                2020-12-18 08:16
              
            
            
                                                                       
There are some good solutions already, but if you use requests just follow Github's API.

The endpoint for all content is

GET /repos/:owner/:repo/contents/:path


But keep in mind that the default behavior of Github's API is to encode the content using base64.

In your case you would do the following:

#!/usr/bin/env python3
import base64
import requests


url = 'https://api.github.com/repos/{user}/{repo_name}/contents/{path_to_file}'
req = requests.get(url)
if req.status_code == requests.codes.ok:
    req = req.json()  # the response is a JSON
    # req is now a dict with keys: name, encoding, url, size ...
    # and content. But it is encoded with base64.
    content = base64.decodestring(req['content'])
else:
    print('Content was not found.')

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复