Compute the cumulative sum of a list until a zero appears

前端未结

关注

 7  1054

I have a (long) list in which zeros and ones appear at random:

list_a = [1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1]

I want to get the list_b

相关标签:

7条回答

没有蜡笔的小新

2021-02-01 18:39
Starting Python 3.8, and the introduction of assignment expressions (PEP 572) (:= operator), we can use and increment a variable within a list comprehension:
```
# items = [1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1]
total = 0
[total := (total + x if x else x) for x in items]
# [1, 2, 3, 0, 1, 2, 0, 1, 0, 1, 2, 3]
```
This:
- Initializes a variable total to 0 which symbolizes the running sum
- For each item, this both:
  - either increments total with the current looped item (total := total + x) via an assignment expression or set it back to 0 if the item is 0
  - and at the same time, maps x to the new value of total
0 讨论(0)
发布评论:

提交评论
- 加载中...
旧巷少年郎

2021-02-01 18:40
You're overthinking this.

Option 1
You can just iterate over the indices and update accordingly (computing the cumulative sum), based on whether the current value is 0 or not.
```
data = [1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1]

for i in range(1, len(data)):
    if data[i]:  
        data[i] += data[i - 1] 
```
That is, if the current element is non-zero, then update the element at the current index as the sum of the current value, plus the value at the previous index.
```
print(data)
[1, 2, 3, 0, 1, 2, 0, 1, 0, 1, 2, 3]
```
Note that this updates your list in place. You can create a copy in advance if you don't want that - new_data = data.copy() and iterate over new_data in the same manner.

Option 2
You can use the pandas API if you need performance. Find groups based on the placement of 0s, and use groupby + cumsum to compute group-wise cumulative sums, similar to above:
```
import pandas as pd

s = pd.Series(data)    
data = s.groupby(s.eq(0).cumsum()).cumsum().tolist()
```
```
print(data)
[1, 2, 3, 0, 1, 2, 0, 1, 0, 1, 2, 3]
```
Performance

First, the setup -
```
data = data * 100000
s = pd.Series(data)
```
Next,
```
%%timeit
new_data = data.copy()
for i in range(1, len(data)):
    if new_data[i]:  
        new_data[i] += new_data[i - 1]

328 ms ± 4.09 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
```
And, timing the copy separately,
```
%timeit data.copy()
8.49 ms ± 17.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
```
So, the copy doesn't really take much time. Finally,
```
%timeit s.groupby(s.eq(0).cumsum()).cumsum().tolist()
122 ms ± 1.69 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
```
The pandas approach is conceptually linear (just like the other approaches) but faster by a constant degree because of the implementation of the library.
0 讨论(0)
发布评论:

提交评论
- 加载中...
借酒劲吻你

2021-02-01 18:41
If you want a compact native Python solution that is probably the most memory efficient, although not the fastest (see the comments), you could draw extensively from itertools:
```
>>> from itertools import groupby, accumulate, chain
>>> list(chain.from_iterable(accumulate(g) for _, g in groupby(list_a, bool)))
[1, 2, 3, 0, 1, 2, 0, 1, 0, 1, 2, 3]
```
The steps here are: group the list into sublists based on presence of 0 (which is falsy), take the cumulative sum of the values within each sublist, flatten the sublists.

As Stefan Pochmann comments, if your list is binary in contents (like consisting of only 1s and 0s only) then you don't need to pass a key to groupby() at all and it will fall back on the identity function. This is ~30% faster than using bool for this case:
```
>>> list(chain.from_iterable(accumulate(g) for _, g in groupby(list_a)))
[1, 2, 3, 0, 1, 2, 0, 1, 0, 1, 2, 3]
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

礼貌的吻别

2021-02-01 18:45

You are playing with the indices too much in the code you posted when you do not really have to. You can just keep track of a cumulative sum and reset it to 0 every time you meet a 0.

list_a = [1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1]

cum_sum = 0
list_b = []
for item in list_a:
    if not item:            # if our item is 0
        cum_sum = 0         # the cumulative sum is reset (set back to 0)
    else:
        cum_sum += item     # otherwise it sums further
    list_b.append(cum_sum)  # and no matter what it gets appended to the result
print(list_b)  # -> [1, 2, 3, 0, 1, 2, 0, 1, 0, 1, 2, 3]

0 讨论(0)

甜味超标

2021-02-01 18:45
I would use a generator if you want performance (and it's simple too).
```
def weird_cumulative_sum(seq):
    s = 0
    for n in seq:
        s = 0 if n == 0 else s + n
        yield s

list_b = list(weird_cumulative_sum(list_a_))
```
I don't think you'll get better than that, in any case you'll have to iterate over list_a at least once.

Note that I called list() on the result to get a list like in your code but if the code using list_b is iterating over it only once with a for loop or something there is no use converting the result to a list, just pass it the generator.
0 讨论(0)
发布评论:

提交评论
- 加载中...
夕颜

2021-02-01 18:49
It doesn't have to be as complicated as made in the question asked, a very simple approach could be this.
```
list_a = [1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1]
list_b = []
s = 0
for a in list_a:
    s = a+s if a !=0 else 0
    list_b.append(s)

print list_b
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页