What is the proper way to determine if an object is a bytes-like object in Python?

后端未结

关注

 7  1806

I have code that expects str but will handle the case of being passed bytes in the following way:

if isinstance(data, bytes):
    data


                      
              相关标签:


      
      
        
          7条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  别那么骄傲        
                
              
                            
                2021-01-30 10:20
              
            
            
                                                                       
This code is not correct unless you know something we don't:

if isinstance(data, bytes):
    data = data.decode()


You do not (appear to) know the encoding of data.  You are assuming it's UTF-8, but that could very well be wrong.  Since you do not know the encoding, you do not have text.  You have bytes, which could have any meaning under the sun.

The good news is that most random sequences of bytes are not valid UTF-8, so when this breaks, it will break loudly (errors='strict' is the default) instead of silently doing the wrong thing.  The even better news is that most of those random sequences that happen to be valid UTF-8 are also valid ASCII, which (nearly) everyone agrees on how to parse anyway.

The bad news is that there is no reasonable way to fix this.  There is a standard way of providing encoding information: use str instead of bytes.  If some third-party code handed you a bytes or bytearray object without any further context or information, the only correct action is to fail.



Now, assuming you do know the encoding, you can use functools.singledispatch here:

@functools.singledispatch
def foo(data, other_arguments, ...):
    raise TypeError('Unknown type: '+repr(type(data)))

@foo.register(str)
def _(data, other_arguments, ...):
    # data is a str

@foo.register(bytes)
@foo.register(bytearray)
def _(data, other_arguments, ...):
    data = data.decode('encoding')
    # explicit is better than implicit; don't leave the encoding out for UTF-8
    return foo(data, other_arguments, ...)


This doesn't work on methods, and data has to be the first argument.  If those restrictions don't work for you, use one of the other answers instead.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  礼貌的吻别        
                
              
                            
                2021-01-30 10:26
              
            
            
                                                                       
You can use:

isinstance(data, (bytes, bytearray))


Due to the different base class is used here. 

>>> bytes.__base__
<type 'basestring'>
>>> bytearray.__base__
<type 'object'>


To check bytes

>>> by = bytes()
>>> isinstance(by, basestring)
True


However, 

>>> buf = bytearray()
>>> isinstance(buf, basestring)
False


The above codes are test under python 2.7

Unfortunately, under python 3.4, they are same....

>>> bytes.__base__
<class 'object'>
>>> bytearray.__base__
<class 'object'>

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  旧巷少年郎        
                
              
                            
                2021-01-30 10:27
              
            
            
                                                                       
There are a few approaches you could use here.

Duck typing

Since Python is duck typed, you could simply do as follows (which seems to be the way usually suggested):

try:
    data = data.decode()
except (UnicodeDecodeError, AttributeError):
    pass


You could use hasattr as you describe, however, and it'd probably be fine. This is, of course, assuming the .decode() method for the given object returns a string, and has no nasty side effects.

I personally recommend either the exception or hasattr method, but whatever you use is up to you.

Use str()

This approach is uncommon, but is possible:

data = str(data, "utf-8")


Other encodings are permissible, just like with the buffer protocol's .decode(). You can also pass a third parameter to specify error handling.

Single-dispatch generic functions (Python 3.4+)

Python 3.4 and above include a nifty feature called single-dispatch generic functions, via functools.singledispatch. This is a bit more verbose, but it's also more explicit:

def func(data):
    # This is the generic implementation
    data = data.decode()
    ...

@func.register(str)
def _(data):
    # data will already be a string
    ...


You could also make special handlers for bytearray and bytes objects if you so chose.

Beware: single-dispatch functions only work on the first argument! This is an intentional feature, see PEP 433.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  慢半拍i        
                
              
                            
                2021-01-30 10:32
              
            
            
                                                                       
>>> content = b"hello"
>>> text = "hello"
>>> type(content)
<class 'bytes'>
>>> type(text)
<class 'str'>
>>> type(text) is str
True
>>> type(content) is bytes
True

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  生来不讨喜        
                
              
                            
                2021-01-30 10:32
              
            
            
                                                                       
It depends what you want to solve. If you want to have the same code that converts both cases to a string, you can simply convert the type to bytes first, and then decode. This way, it is a one-liner:

#!python3

b1 = b'123456'
b2 = bytearray(b'123456')

print(type(b1))
print(type(b2))

s1 = bytes(b1).decode('utf-8')
s2 = bytes(b2).decode('utf-8')

print(s1)
print(s2)


This way, the answer for you may be:

data = bytes(data).decode()


Anyway, I suggest to write 'utf-8' explicitly to the decode, if you do not care to spare few bytes. The reason is that the next time you or someone else will read the source code, the situation will be more apparent.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  感动是毒        
                
              
                            
                2021-01-30 10:36
              
            
            
                                                                       
The test if isinstance(data, bytes) or if type(data) == bytes, etc. doesn't work in Python 2, where a simple ASCII string  passes the test of ! Because I use both Python 2 and Python 3, in order to overcome this I do the following check:
if str(type(data)).find("bytes") != -1: print("It's <bytes>")

It's a little ugly, but it does the job the question asks and it always works, in the simplest way.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
   
          
     1
2
下一页
           
           
        
                                  
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复