What is the proper way to determine if an object is a bytes-like object in Python?

后端未结

关注

 7  1804

慢半拍i 2021-01-30 09:40

I have code that expects str but will handle the case of being passed bytes in the following way:

if isinstance(data, bytes):
    data


      
      
        
          7条回答        

        
                    
            
            
                         
                
              
              
                
                   别那么骄傲
                                             
                
                
                (楼主)
            
              
              
                2021-01-30 10:20
              

            
            
                        
This code is not correct unless you know something we don't:

if isinstance(data, bytes):
    data = data.decode()


You do not (appear to) know the encoding of data.  You are assuming it's UTF-8, but that could very well be wrong.  Since you do not know the encoding, you do not have text.  You have bytes, which could have any meaning under the sun.

The good news is that most random sequences of bytes are not valid UTF-8, so when this breaks, it will break loudly (errors='strict' is the default) instead of silently doing the wrong thing.  The even better news is that most of those random sequences that happen to be valid UTF-8 are also valid ASCII, which (nearly) everyone agrees on how to parse anyway.

The bad news is that there is no reasonable way to fix this.  There is a standard way of providing encoding information: use str instead of bytes.  If some third-party code handed you a bytes or bytearray object without any further context or information, the only correct action is to fail.



Now, assuming you do know the encoding, you can use functools.singledispatch here:

@functools.singledispatch
def foo(data, other_arguments, ...):
    raise TypeError('Unknown type: '+repr(type(data)))

@foo.register(str)
def _(data, other_arguments, ...):
    # data is a str

@foo.register(bytes)
@foo.register(bytearray)
def _(data, other_arguments, ...):
    data = data.decode('encoding')
    # explicit is better than implicit; don't leave the encoding out for UTF-8
    return foo(data, other_arguments, ...)


This doesn't work on methods, and data has to be the first argument.  If those restrictions don't work for you, use one of the other answers instead.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它7个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复