numpy replace groups of elements with integers incrementally

前端未结

关注

 2  989

花落未央 2021-01-13 02:53

import numpy as np
data = np.array([\'b\',\'b\',\'b\',\'a\',\'a\',\'a\',\'a\',\'c\',\'c\',\'d\',\'d\',\'d\'])

I need to replace each group of strin

2条回答

走了就别回头了 (楼主)

2021-01-13 03:09

EDIT: This doesn't always work:

>>> a,b,c = np.unique(data, return_index=True, return_inverse=True)
>>> c # almost!!!
array([1, 1, 1, 0, 0, 0, 0, 2, 2, 3, 3, 3])
>>> np.argsort(b)[c]
array([0, 0, 0, 1, 1, 1, 1, 2, 2, 3, 3, 3], dtype=int64)

But this does work:

def replace_groups(data):
    a,b,c, = np.unique(data, True, True)
    _, ret = np.unique(b[c], False, True)
    return ret

and is faster than the dictionary replacement approach, about 33% for larger datasets:

def replace_groups_dict(data):
    _, ind = np.unique(data, return_index=True)
    unqs = data[np.sort(ind)]
    data_id = dict(zip(unqs, np.arange(data.size)))
    num = np.array([data_id[datum] for datum in data])
    return num

In [7]: %timeit replace_groups_dict(lines100)
10000 loops, best of 3: 68.8 us per loop

In [8]: %timeit replace_groups_dict(lines200)
10000 loops, best of 3: 106 us per loop

In [9]: %timeit replace_groups_dict(lines)
10 loops, best of 3: 32.1 ms per loop

In [10]: %timeit replace_groups(lines100)
10000 loops, best of 3: 67.1 us per loop

In [11]: %timeit replace_groups(lines200)
10000 loops, best of 3: 78.4 us per loop

In [12]: %timeit replace_groups(lines)
10 loops, best of 3: 23.1 ms per loop

0 讨论(0)

查看其它2个回答