Finding Patterns in a Numpy Array

后端 未结 5 460
忘了有多久
忘了有多久 2021-01-05 18:28

I am trying to find patterns in a numpy array, called values. I\'d like to return the starting index position of the pattern. I know I

相关标签:
5条回答
  • 2021-01-05 18:45

    If the input is random Ed Smith solution is faster. But if you has a few set of available values this hash-solution can help:

    """
    Can be replaced with any revertable hash
    """
    def my_hash(rem, h, add):
        return rem^h^add
    
    """
    Imput
    """
    values = np.array([0,1,2,1,2,4,5,6,1,2,1])
    searchval = [1,2]
    
    
    """
    Prepare
    """
    sh = 0
    vh = 0
    ls = len(searchval)
    lv = len(values)
    
    for i in range(0, len(searchval)):
        vh = my_hash(0, vh, values[i])
        sh = my_hash(0, sh, searchval[i])
    
    """
    Find matches
    """
    for i in range(0, lv-ls):
        if sh == vh:
            eq = True
            for j in range(0, ls):
                if values[i+j] != searchval[j]:
                    eq = False
                    break
            if eq:
                print i
        vh = my_hash(values[i], vh, values[i+ls])
    
    0 讨论(0)
  • 2021-01-05 18:49

    Here's a straight forward approach to using where. Start with a logical expression that finds the matches:

    In [670]: values = np.array([0,1,2,1,2,4,5,6,1,2,1])
         ...: searchval = [1,2]
         ...: 
    In [671]: (values[:-1]==searchval[0]) & (values[1:]==searchval[1])
    Out[671]: array([False,  True, False,  True, False, False, False, False,  True, False], dtype=bool)
    In [672]: np.where(_)
    Out[672]: (array([1, 3, 8], dtype=int32),)
    

    That could be generalized into a loop that operates on multiple searchval. Getting the slice range correct will take some fiddling. The roll suggested in another answer might be easier, but I suspect a bit slower.

    As long as searchval is small compared to values this general approach should be efficient. There is a np.in1d that does this sort of match, but with a or test. So it isn't applicable. But it too uses this iterative approach is the searchval list is small enough.

    Generalized slicing

    In [716]: values
    Out[716]: array([0, 1, 2, 1, 2, 4, 5, 6, 1, 2, 1])
    In [717]: searchvals=[1,2,1]
    In [718]: idx = [np.s_[i:m-n+1+i] for i in range(n)]
    In [719]: idx
    Out[719]: [slice(0, 9, None), slice(1, 10, None), slice(2, 11, None)]
    In [720]: [values[idx[i]] == searchvals[i] for i in range(n)]
    Out[720]: 
    [array([False,  True, False,  True, False, False, False, False,  True], dtype=bool),
     array([False,  True, False,  True, False, False, False, False,  True], dtype=bool),
     array([False,  True, False, False, False, False,  True, False,  True], dtype=bool)]
    In [721]: np.all(_, axis=0)
    Out[721]: array([False,  True, False, False, False, False, False, False,  True], dtype=bool)
    In [722]: np.where(_)
    Out[722]: (array([1, 8], dtype=int32),)
    

    I used the intermediate np.s_ to look at the slices and make sure they look reasonable.

    as_strided

    An advanced trick would be to use as_strided to construct the 'rolled' array and perform a 2d == test on that. as_strided is neat but tricky. To use it correctly you have to understand strides, and get the shape correct.

    In [740]: m,n = len(values), len(searchvals)
    In [741]: values.shape
    Out[741]: (11,)
    In [742]: values.strides
    Out[742]: (4,)
    In [743]: 
    In [743]: M = as_strided(values, shape=(n,m-n+1),strides=(4,4))
    In [744]: M
    Out[744]: 
    array([[0, 1, 2, 1, 2, 4, 5, 6, 1],
           [1, 2, 1, 2, 4, 5, 6, 1, 2],
           [2, 1, 2, 4, 5, 6, 1, 2, 1]])
    In [745]: M == np.array(searchvals)[:,None]
    Out[745]: 
    array([[False,  True, False,  True, False, False, False, False,  True],
           [False,  True, False,  True, False, False, False, False,  True],
           [False,  True, False, False, False, False,  True, False,  True]], dtype=bool)
    In [746]: np.where(np.all(_,axis=0))
    Out[746]: (array([1, 8], dtype=int32),)
    
    0 讨论(0)
  • 2021-01-05 18:54

    I think this does the job:

    np.where((values == 1) & (np.roll(values,-1) == 2))[0]
    
    0 讨论(0)
  • 2021-01-05 19:04

    Couldn't you simply use np.where (assuming this is the optimal way to find an element) and then only check pattens which satisfy the first condition.

    import numpy as np
    values = np.array([0,1,2,1,2,4,5,6,1,2,1])
    searchval = [1,2]
    N = len(searchval)
    possibles = np.where(values == searchval[0])[0]
    
    solns = []
    for p in possibles:
        check = values[p:p+N]
        if np.all(check == searchval):
            solns.append(p)
    
    print(solns)
    
    0 讨论(0)
  • 2021-01-05 19:07

    Compact straitforward solution will be a "legal" variant of as_strided solution. Others have mentioned np.roll. But here is an universal solution with the only circle (132 µs).

    seq = np.array([0,1,2,1,2,4,5,6,1,2,1])
    patt = np.array([1,2])
    
    Seq = np.vstack(np.roll(seq, shift) for shift in -np.arange(len(patt))).T
    where(all(Seq == patt, axis=1))[0]
    

    The another option for sequences with small integers will be converting to string. It is faster per near 6 times (20 µs). For small positive integers only!

    import re
    
    def to_string(arr):
        return ''.join(map(chr, arr))
    
    array([m.start() for m in re.finditer(to_string(patt), to_string(seq))])
    
    0 讨论(0)
提交回复
热议问题