Compute the cumulative sum of a list until a zero appears

前端 未结 7 1056
小鲜肉
小鲜肉 2021-02-01 17:52

I have a (long) list in which zeros and ones appear at random:

list_a = [1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1]

I want to get the list_b

7条回答
  •  旧巷少年郎
    2021-02-01 18:40

    You're overthinking this.

    Option 1
    You can just iterate over the indices and update accordingly (computing the cumulative sum), based on whether the current value is 0 or not.

    data = [1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1]
    
    for i in range(1, len(data)):
        if data[i]:  
            data[i] += data[i - 1] 
    

    That is, if the current element is non-zero, then update the element at the current index as the sum of the current value, plus the value at the previous index.

    print(data)
    [1, 2, 3, 0, 1, 2, 0, 1, 0, 1, 2, 3]
    

    Note that this updates your list in place. You can create a copy in advance if you don't want that - new_data = data.copy() and iterate over new_data in the same manner.


    Option 2
    You can use the pandas API if you need performance. Find groups based on the placement of 0s, and use groupby + cumsum to compute group-wise cumulative sums, similar to above:

    import pandas as pd
    
    s = pd.Series(data)    
    data = s.groupby(s.eq(0).cumsum()).cumsum().tolist()
    

    print(data)
    [1, 2, 3, 0, 1, 2, 0, 1, 0, 1, 2, 3]
    

    Performance

    First, the setup -

    data = data * 100000
    s = pd.Series(data)
    

    Next,

    %%timeit
    new_data = data.copy()
    for i in range(1, len(data)):
        if new_data[i]:  
            new_data[i] += new_data[i - 1]
    
    328 ms ± 4.09 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    

    And, timing the copy separately,

    %timeit data.copy()
    8.49 ms ± 17.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    

    So, the copy doesn't really take much time. Finally,

    %timeit s.groupby(s.eq(0).cumsum()).cumsum().tolist()
    122 ms ± 1.69 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
    

    The pandas approach is conceptually linear (just like the other approaches) but faster by a constant degree because of the implementation of the library.

提交回复
热议问题