I want to find and replace multiple values in an 1D array / list with new ones.
In example for a list
a=[2, 3, 2, 5, 4, 4, 1, 2]
I woul
The numpy_indexed package (disclaimer: I am its author) provides an elegant and efficient vectorized solution to this type of problem:
import numpy_indexed as npi
remapped_a = npi.remap(a, val_old, val_new)
The method implemented is based on searchsorted like that of swenzel and should have similar good performance, but more general. For instance, the items of the array do not need to be ints, but can be any type, even nd-subarrays themselves.
If all values in 'a' are expected to be present in 'val_old', you can set the optional 'missing' kwarg to 'raise' (default is 'ignore'). Performance will be slightly better, and you will get a KeyError if that assumption is not satisfied.
To replace values in a list using two other lists as key:value pairs there are several approaches. All of them use "list compression".
Using list.index():
a=[2, 3, 2, 5, 4, 4, 1, 2]
val_old=[1, 2, 3, 4, 5]
val_new=[2, 3, 4, 5, 1]
a_new=[val_new[val_old.index(x)] for x in a]
Using your special case:
a=[2, 3, 2, 5, 4, 4, 1, 2]
a_new=[x % 5 + 1 for x in a]
Try this for your expected output, works even if elements
not in value_old
.
>>>[val_new[val_old.index(i)] if i in val_old else i for i in a]
[3, 4, 3, 1, 5, 5, 2, 3]
Assuming that your val_old
array is sorted (which is the case here, but if later on it's not, then don't forget to sort val_new
along with it!), you can use numpy.searchsorted
and then access val_new
with the results.
This does not work if a number has no mapping, you will have to provide 1to1 mappings in that case.
In [1]: import numpy as np
In [2]: a = np.array([2, 3, 2, 5, 4, 4, 1, 2])
In [3]: old_val = np.array([1, 2, 3, 4, 5])
In [4]: new_val = np.array([2, 3, 4, 5, 1])
In [5]: a_new = np.array([3, 4, 3, 1, 5, 5, 2, 3])
In [6]: i = np.searchsorted(old_val,a)
In [7]: a_replaced = new_val[i]
In [8]: all(a_replaced == a_new)
Out[8]: True
50k numbers? No problem!
In [23]: def timed():
t0 = time.time()
i = np.searchsorted(old_val, a)
a_replaced = new_val[i]
t1 = time.time()
print('%s Seconds'%(t1-t0))
....:
In [24]: a = np.random.choice(old_val, 50000)
In [25]: timed()
0.00288081169128 Seconds
500k? You won't notice the difference!
In [26]: a = np.random.choice(old_val, 500000)
In [27]: timed()
0.019248008728 Seconds
In vanilla Python, without the speed of numpy
or pandas
, this is one way:
a = [2, 3, 2, 5, 4, 4, 1, 2]
val_old = [1, 2, 3, 4, 5]
val_new = [2, 3, 4, 5, 1]
expected_a_new = [3, 4, 3, 1, 5, 5, 2, 3]
d = dict(zip(val_old, val_new))
a_new = [d.get(e, e) for e in a]
print a_new # [3, 4, 3, 1, 5, 5, 2, 3]
print a_new == expected_a_new # True
The average time complexity for this algorithm is O(M + N)
where M
is the length of your "translation list" and N
is the length of list a
.
>>> arr = np.empty(a.max() + 1, dtype=val_new.dtype)
>>> arr[val_old] = val_new
>>> arr[a]
array([3, 4, 3, 1, 5, 5, 2, 3])