Identify clusters linked by delta to the left and different delta to the right

后端 未结 1 1554
情书的邮戳
情书的邮戳 2021-02-07 18:58

Consider the sorted array a:

a = np.array([0, 2, 3, 4, 5, 10, 11, 11, 14, 19, 20, 20])

If I specified left and right deltas,

相关标签:
1条回答
  • 2021-02-07 19:15

    We will leverage np.searchsorted and logic to find cluster edges.

    First, let's take a closer look at what np.searchsorted does:

    Find the indices into a sorted array a such that, if the corresponding elements in v were inserted before the indices, the order of a would be preserved.

    What I'll do is execute np.searchsorted with a using a - delta_left. Let's look at that for delta_left = 1

    # a =
    # [ 0  2  3  4  5 10 11 11 14 19 20 20]
    # 
    # a - delta_left
    # [-1  1  2  3  4  9 10 10 13 18 19 19]
    
    • -1 would get inserted at position 0 to maintain order
    • 1 would get inserted at position 1 to maintain order
    • 2 would get inserted at position 1 as well, indicating that 2 might be in the same cluster as 1
    • 3 would get inserted at position 2 indicating that 3 might be in the same cluster as 2
    • so on and so forth

    What we notice is that only when an element less delta would get inserted at its current position would we consider a new cluster starting.

    We do this again for the right side with a difference. The difference is that by default if a bunch of elements are the same, np.searchsorted assumes to insert into the front of values. To identify the ends of clusters, I'm going to want to insert after the identical elements. Therefore I'll use the paramater side='right'

    If ‘left’, the index of the first suitable location found is given. If ‘right’, return the last such index. If there is no suitable index, return either 0 or N (where N is the length of a).

    Now the logic. A cluster can only begin if a prior cluster has ended, with the exception of the first cluster. We'll then consider a shifted version of the results of our second np.searchsorted


    Let's now define our function

    def delta_cluster(a, dleft, dright):
        # use to track whether searchsorted results are at correct positions
        rng = np.arange(len(a))
    
        edge_left = a.searchsorted(a - dleft)
        starts = edge_left == rng
    
        # we append 0 to shift
        edge_right = np.append(0, a.searchsorted(a + dright, side='right')[:-1])
        ends = edge_right == rng
    
        return (starts & ends).cumsum()
    

    demonstration

    with left, right deltas equal to 1 and 1

    print(delta_cluster(a, 1, 1))
    
    [1 2 2 2 2 3 3 3 4 5 5 5]
    

    with left, right deltas equal to 2 and 1

    print(delta_cluster(a, 2, 1))
    
    [1 1 1 1 1 2 2 2 3 4 4 4]
    

    Extra Credit
    What if a isn't sorted?
    I'll utilize information learned from this post

    def delta_cluster(a, dleft, dright):
    
        s = a.argsort()
    
        size = s.size
    
        if size > 1000:
            y = np.empty(s.size, dtype=np.int64)
            y[s] = np.arange(s.size)
        else:
            y = s.argsort()
    
        a = a[s]
    
        rng = np.arange(len(a))
    
        edge_left = a.searchsorted(a - dleft)
        starts = edge_left == rng
    
        edge_right = np.append(0, a.searchsorted(a + dright, side='right')[:-1])
        ends = edge_right == rng
    
        return (starts & ends).cumsum()[y]
    

    demonstration

    b = np.random.permutation(a)
    print(b)
    
    [14 10  3 11 20  0 19 20  4 11  5  2]
    

    print(delta_cluster(a, 2, 1))
    
    [1 1 1 1 1 2 2 2 3 4 4 4]
    

    print(delta_cluster(b, 2, 1))
    
    [3 2 1 2 4 1 4 4 1 2 1 1]
    

    print(delta_cluster(b, 2, 1)[b.argsort()])
    
    [1 1 1 1 1 2 2 2 3 4 4 4]
    
    0 讨论(0)
提交回复
热议问题