Lets say I have arrays a
and b
a = np.array([1,2,3])
b = np.array([\'red\',\'red\',\'red\'])
If I were to apply s
You can handle variable length strings by setting the dtype
of b
to be "object"
:
import numpy as np
a = np.array([1,2,3])
b = np.array(['red','red','red'], dtype="object")
b[a<3] = "blue"
print(b)
this outputs:
['blue' 'blue' 'red']
This dtype
will handle strings, or other general Python objects. This also necessarily means that under the hood you'll have a numpy
array of pointers, so don't expect the performance you get when using a primitive datatype.
If you construct such array, the type looks like:
>>> b
array(['red', 'red', 'red'], dtype='<U3')
This means that the strings have a length of at most 3 characters. In case you assign longer strings, these strings are truncated.
You can change the data type to make the maximum length longer, for example:
b2 = b.astype('<U10')
So now we have an array that can store strings up to 10 characters. Note however that if you make the maximum length larger, the size of the matrix will increase.
A marginal improvement on your current approach (which is potentially very wasteful in space):
import numpy as np
a = np.array([1,2,3])
b = np.array(['red','red','red'])
replacement = "blue"
b = b.astype('<U{}'.format(max(len(replacement), a.dtype.itemsize)))
b[a<3] = replacement
print(b)
This accounts for strings already in the array, so the allocated space only increases if the replacement
is longer than all existing strings in the array.