Python: confusion between types and dtypes

前端未结

关注

 3  761

Suppose I enter:

a = uint8(200)
a*2

Then the result is 400, and it is recast to be of type uint16.

However:

a = arr


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  时光说笑        
                
              
                            
                2021-01-11 21:09
              
            
            
                                                                       
A numpy array contains elements of the same type, so np.array([200],dtype=uint8) is an array with one value of type uint8. When you do np.uint8(200), you don't have an array, only a single value. This make a huge difference.

When performing some operation on the array, the type stays the same, irrespective of a single value overflows or not. Automatic upcasting in arrays is forbidden, as the size of the whole array has to change. This is only done if the user explicitly wants that. When performing an operation on a single value, it can easily upcast, not influencing other values.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  一整个雨季        
                
              
                            
                2021-01-11 21:29
              
            
            
                                                                       
The simple, high-level answer is that NumPy layers a second type system atop Python's type system. 

When you ask for the type of an NumPy object, you get the type of the container--something like numpy.ndarray. But when you ask for the dtype, you get the (numpy-managed) type of the elements. 



>>> from numpy import *
>>> arr = array([1.0, 4.0, 3.14])
>>> type(arr)
<type 'numpy.ndarray'>
>>> arr.dtype
dtype('float64')


Sometimes, as when using the default float type, the element data type (dtype) is equivalent to a Python type. But that's equivalent, not identical:



>>> arr.dtype == float
True
>>> arr.dtype is float
False


In other cases, there is no equivalent Python type. For example, when you specified uint8. Such data values/types can be managed by Python, but unlike in C, Rust, and other "systems languages," managing values that align directly to machine data types (like uint8 aligns closely with "unsigned bytes" computations) is not the common use-case for Python.

So the big story is that NumPy provides containers like arrays and matrices that operate under its own type system. And it provides a bunch of highly useful, well-optimized routines to operate on those containers (and their elements). You can mix-and-match NumPy and normal Python computations, if you use care. 

There is no Python type uint8. There is a constructor function named uint8, which when called returns a NumPy type:



>>> u = uint8(44)
>>> u
44
>>> u.dtype
dtype('uint8')
>>> type(u)
<type 'numpy.uint8'>


So "can I create an array of type (not dtype) uint8...?" No. You can't. There is no such animal. 
You can 
do computations constrained to uint8 rules without using NumPy arrays (a.k.a. NumPy scalar values). E.g.:



>>> uint8(44 + 1000)
20
>>> uint8(44) + uint8(1000)
20


But if you want to compute values mod 256, it's probably easier to use Python's mod operator:



>> (44 + 1000) % 256
20


Driving data values larger than 255 into uint8 data types and then doing arithmetic is a rather backdoor way to get mod-256 arithmetic. If you're not careful, you'll either cause Python to "upgrade" your values to full integers (killing your mod-256 scheme), or trigger overflow exceptions (because tricks that work great in C and machine language are often flagged by higher level languages).
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  清歌不尽        
                
              
                            
                2021-01-11 21:35
              
            
            
                                                                       
The type of a NumPy array is numpy.ndarray; this is just the type of Python object it is (similar to how type("hello") is str for example). 

dtype just defines how bytes in memory will be interpreted by a scalar (i.e. a single number) or an array and the way in which the bytes will be treated (e.g. int/float). For that reason you don't change the type of an array or scalar, just its dtype.

As you observe, if you multiply two scalars, the resulting datatype is the smallest "safe" type to which both values can be cast. However, multiplying an array and a scalar will simply return an array of the same datatype. The documentation for the function np.inspect_types is clear about when a particular scalar or array object's dtype is changed:


  Type promotion in NumPy works similarly to the rules in languages like C++, with some slight differences. When both scalars and arrays are used, the array's type takes precedence and the actual value of the scalar is taken into account.


The documentation continues:


  If there are only scalars or the maximum category of the scalars is higher than the maximum category of the arrays, the data types are combined with promote_types to produce the return value.


So for np.uint8(200) * 2, two scalars, the resulting datatype will be the type returned by np.promote_types:

>>> np.promote_types(np.uint8, int)
dtype('int32')


For np.array([200], dtype=np.uint8) * 2 the array's datatype takes precedence over the scalar int and a np.uint8 datatype is returned.

To address your final question about retaining the dtype of a scalar during operations, you'll have to restrict the datatypes of any other scalars you use to avoid NumPy's automatic dtype promotion:

>>> np.array([200], dtype=np.uint8) * np.uint8(2)
144


The alternative, of course, is to simply wrap the single value in a NumPy array (and then NumPy won't cast it in operations with scalars of different dtype).

To promote the type of an array during an operation, you could wrap any scalars in an array first:

>>> np.array([200], dtype=np.uint8) * np.array([2])
array([400])

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复