Find and replace multiple values in python

前端 未结 6 1197
攒了一身酷
攒了一身酷 2021-02-15 03:59

I want to find and replace multiple values in an 1D array / list with new ones.

In example for a list

a=[2, 3, 2, 5, 4, 4, 1, 2]

I woul

相关标签:
6条回答
  • 2021-02-15 04:38

    The numpy_indexed package (disclaimer: I am its author) provides an elegant and efficient vectorized solution to this type of problem:

    import numpy_indexed as npi
    remapped_a = npi.remap(a, val_old, val_new)
    

    The method implemented is based on searchsorted like that of swenzel and should have similar good performance, but more general. For instance, the items of the array do not need to be ints, but can be any type, even nd-subarrays themselves.

    If all values in 'a' are expected to be present in 'val_old', you can set the optional 'missing' kwarg to 'raise' (default is 'ignore'). Performance will be slightly better, and you will get a KeyError if that assumption is not satisfied.

    0 讨论(0)
  • 2021-02-15 04:43

    To replace values in a list using two other lists as key:value pairs there are several approaches. All of them use "list compression".

    Using list.index():

    a=[2, 3, 2, 5, 4, 4, 1, 2]
    val_old=[1, 2, 3, 4, 5] 
    val_new=[2, 3, 4, 5, 1]
    a_new=[val_new[val_old.index(x)] for x in a]
    

    Using your special case:

    a=[2, 3, 2, 5, 4, 4, 1, 2]
    a_new=[x % 5 + 1 for x in a]
    
    0 讨论(0)
  • 2021-02-15 04:47

    Try this for your expected output, works even if elements not in value_old.

    >>>[val_new[val_old.index(i)] if i in val_old else i for i in a]
    [3, 4, 3, 1, 5, 5, 2, 3]
    
    0 讨论(0)
  • 2021-02-15 04:50

    Assuming that your val_old array is sorted (which is the case here, but if later on it's not, then don't forget to sort val_new along with it!), you can use numpy.searchsorted and then access val_new with the results.
    This does not work if a number has no mapping, you will have to provide 1to1 mappings in that case.

    In [1]: import numpy as np
    
    In [2]: a = np.array([2, 3, 2, 5, 4, 4, 1, 2])
    
    In [3]: old_val = np.array([1, 2, 3, 4, 5])
    
    In [4]: new_val = np.array([2, 3, 4, 5, 1])
    
    In [5]: a_new = np.array([3, 4, 3, 1, 5, 5, 2, 3])
    
    In [6]: i = np.searchsorted(old_val,a)
    
    In [7]: a_replaced = new_val[i]
    
    In [8]: all(a_replaced == a_new)
    Out[8]: True
    

    50k numbers? No problem!

    In [23]: def timed():
        t0 = time.time()
        i = np.searchsorted(old_val, a)
        a_replaced = new_val[i]
        t1 = time.time()
        print('%s Seconds'%(t1-t0))
       ....: 
    
    In [24]: a = np.random.choice(old_val, 50000)
    
    In [25]: timed()
    0.00288081169128 Seconds
    

    500k? You won't notice the difference!

    In [26]: a = np.random.choice(old_val, 500000)
    
    In [27]: timed()
    0.019248008728 Seconds
    
    0 讨论(0)
  • 2021-02-15 04:52

    In vanilla Python, without the speed of numpy or pandas, this is one way:

    a = [2, 3, 2, 5, 4, 4, 1, 2]
    val_old = [1, 2, 3, 4, 5]
    val_new = [2, 3, 4, 5, 1]
    expected_a_new = [3, 4, 3, 1, 5, 5, 2, 3]
    d = dict(zip(val_old, val_new))
    a_new = [d.get(e, e) for e in a]
    print a_new # [3, 4, 3, 1, 5, 5, 2, 3]
    print a_new == expected_a_new # True
    

    The average time complexity for this algorithm is O(M + N) where M is the length of your "translation list" and N is the length of list a.

    0 讨论(0)
  • 2021-02-15 04:55
    >>> arr = np.empty(a.max() + 1, dtype=val_new.dtype)
    >>> arr[val_old] = val_new
    >>> arr[a]
    array([3, 4, 3, 1, 5, 5, 2, 3])
    
    0 讨论(0)
提交回复
热议问题