Numpy getting in the way of int -> float type casting

后端 未结 1 1871
無奈伤痛
無奈伤痛 2021-01-14 12:27

Apologies in advance - I seem to be having a very fundamental misunderstanding that I can\'t clear up. I have a fourvector class with variables for ct and the position vecto

相关标签:
1条回答
  • 2021-01-14 13:10

    All your problems are indeed related.

    A numpy array is an array that holds objects efficiently. It does this by having these objects be of the same type, like strings (of equal length) or integers or floats. It can then easily calculate just how much space each element needs and how many bytes it must "jump" to access the next element (we call these the "strides").

    When you create an array from a list, numpy will try to determine a suitable data type ("dtype") from that list, to ensure all elements can be represented well. Only when you specify the dtype explicitly, will it not make an educated guess.

    Consider the following example:

    >>> import numpy as np
    >>> integer_array = np.array([1,2,3])  # pass in a list of integers
    >>> integer_array
    array([1, 2, 3])
    >>> integer_array.dtype
    dtype('int64')
    

    As you can see, on my system it returns a data type of int64, which is a representation of integers using 8 bytes. It chooses this, because:

    1. numpy recognizes all elements of the list are integers
    2. my system is a 64-bit system

    Now consider an attempt at changing that array:

    >>> integer_array[0] = 2.4  # attempt to put a float in an array with dtype int
    >>> integer_array # it is automatically converted to an int!
    array([2, 2, 3])
    

    As you can see, once a datatype for an array was set, automatic casting to that datatype is done. Let's now consider what happens when you pass in a list that has at least one float:

    >>> float_array = np.array([1., 2,3])
    >>> float_array
    array([ 1.,  2.,  3.])
    >>> float_array.dtype
    dtype('float64')
    

    Once again, numpy determines a suitable datatype for this array.

    Blindly attempting to change the datatype of an array is not wise:

    >>> integer_array.dtype = np.float32
    >>> integer_array
    array([  2.80259693e-45,   0.00000000e+00,   2.80259693e-45,
             0.00000000e+00,   4.20389539e-45,   0.00000000e+00], dtype=float32)
    

    Those numbers are gibberish you might say. That's because numpy tries to reinterpret the memory locations of that array as 4-byte floats (the skilled people will be able to convert the numbers to binary representation and from there reinterpret the original integer values).

    If you want to cast, you'll have to do it explicitly and numpy will return a new array:

    >>> integer_array.dtype = np.int64 # go back to the previous interpretation
    >>> integer_array
    array([2, 2, 3])
    >>> integer_array.astype(np.float32)
    array([ 2.,  2.,  3.], dtype=float32)
    

    Now, to address your specific questions:

    1a) If instantiate a with a = FourVector(ct=5,r=[55,2.,3]), then type(a._r[0]) returns numpy.float64 as opposed to numpy.int32. What is going on here? I expected just a._r[1] to be a float, and instead it changes the type of the whole list?

    That's because numpy has to determine a datatype for the entire array (unless you use a structured array), ensuring all elements fit in that datatype. Only then can numpy iterate over the elements of that array efficiently.

    1b) How do I get the above behaviour (The whole list being floats), without having to instantiate the variables as floats? I read up on the documentation and have tried various methods, like using astype(float), but everything I do seems to keep it as an int. Again, thinking this is the mutable/immutable problem I'm having.

    Specify the dtype when you are creating the array. In your code, that would be:

    self._r = np.array(r, dtype=np.float)
    

    2) I had thought, in the tempx=... line, multiplying by 1.0 would convert it to a float, as it appears this is the reason ct converts to a float, but for some reason it doesn't. Perhaps the same reason as the others?

    That is true. Try printing the datatype of tempx, it should be a float. However, later on, you are reinserting that value into the array self._r, which has the dtype of int. And as you saw previously, that will cast the float back to an integer type.

    0 讨论(0)
提交回复
热议问题