Prevent strings being truncated when replacing values in a numpy array

前端未结

关注

 3  662

Lets say I have arrays a and b

a = np.array([1,2,3])
b = np.array([\'red\',\'red\',\'red\'])

If I were to apply s

相关标签:

3条回答

感情败类

2021-01-07 12:39
You can handle variable length strings by setting the dtype of b to be "object":
```
import numpy as np
a = np.array([1,2,3])
b = np.array(['red','red','red'], dtype="object")

b[a<3] = "blue"

print(b)
```
this outputs:
```
['blue' 'blue' 'red']
```
This dtype will handle strings, or other general Python objects. This also necessarily means that under the hood you'll have a numpy array of pointers, so don't expect the performance you get when using a primitive datatype.
0 讨论(0)
发布评论:

提交评论
- 加载中...
清歌不尽

2021-01-07 12:40
If you construct such array, the type looks like:
```
>>> b
array(['red', 'red', 'red'], dtype='<U3')
```
This means that the strings have a length of at most 3 characters. In case you assign longer strings, these strings are truncated.

You can change the data type to make the maximum length longer, for example:
```
b2 = b.astype('<U10')
```
So now we have an array that can store strings up to 10 characters. Note however that if you make the maximum length larger, the size of the matrix will increase.
0 讨论(0)
发布评论:

提交评论
- 加载中...
长情又很酷

2021-01-07 12:41
A marginal improvement on your current approach (which is potentially very wasteful in space):
```
import numpy as np

a = np.array([1,2,3])
b = np.array(['red','red','red'])

replacement = "blue"
b = b.astype('<U{}'.format(max(len(replacement), a.dtype.itemsize)))
b[a<3] = replacement
print(b)
```
This accounts for strings already in the array, so the allocated space only increases if the replacement is longer than all existing strings in the array.
0 讨论(0)
发布评论:

提交评论
- 加载中...