Prevent strings being truncated when replacing values in a numpy array

前端 未结 3 662
半阙折子戏
半阙折子戏 2021-01-07 12:24

Lets say I have arrays a and b

a = np.array([1,2,3])
b = np.array([\'red\',\'red\',\'red\'])

If I were to apply s

相关标签:
3条回答
  • 2021-01-07 12:39

    You can handle variable length strings by setting the dtype of b to be "object":

    import numpy as np
    a = np.array([1,2,3])
    b = np.array(['red','red','red'], dtype="object")
    
    b[a<3] = "blue"
    
    print(b)
    

    this outputs:

    ['blue' 'blue' 'red']
    

    This dtype will handle strings, or other general Python objects. This also necessarily means that under the hood you'll have a numpy array of pointers, so don't expect the performance you get when using a primitive datatype.

    0 讨论(0)
  • 2021-01-07 12:40

    If you construct such array, the type looks like:

    >>> b
    array(['red', 'red', 'red'], dtype='<U3')

    This means that the strings have a length of at most 3 characters. In case you assign longer strings, these strings are truncated.

    You can change the data type to make the maximum length longer, for example:

    b2 = b.astype('<U10')
    

    So now we have an array that can store strings up to 10 characters. Note however that if you make the maximum length larger, the size of the matrix will increase.

    0 讨论(0)
  • 2021-01-07 12:41

    A marginal improvement on your current approach (which is potentially very wasteful in space):

    import numpy as np
    
    a = np.array([1,2,3])
    b = np.array(['red','red','red'])
    
    replacement = "blue"
    b = b.astype('<U{}'.format(max(len(replacement), a.dtype.itemsize)))
    b[a<3] = replacement
    print(b)
    

    This accounts for strings already in the array, so the allocated space only increases if the replacement is longer than all existing strings in the array.

    0 讨论(0)
提交回复
热议问题