I have a (long) list in which zeros and ones appear at random:
list_a = [1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1]
I want to get the list_b
You're overthinking this.
Option 1
You can just iterate over the indices and update accordingly (computing the cumulative sum), based on whether the current value is 0
or not.
data = [1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1]
for i in range(1, len(data)):
if data[i]:
data[i] += data[i - 1]
That is, if the current element is non-zero, then update the element at the current index as the sum of the current value, plus the value at the previous index.
print(data)
[1, 2, 3, 0, 1, 2, 0, 1, 0, 1, 2, 3]
Note that this updates your list in place. You can create a copy in advance if you don't want that - new_data = data.copy()
and iterate over new_data
in the same manner.
Option 2
You can use the pandas API if you need performance. Find groups based on the placement of 0
s, and use groupby
+ cumsum
to compute group-wise cumulative sums, similar to above:
import pandas as pd
s = pd.Series(data)
data = s.groupby(s.eq(0).cumsum()).cumsum().tolist()
print(data)
[1, 2, 3, 0, 1, 2, 0, 1, 0, 1, 2, 3]
Performance
First, the setup -
data = data * 100000
s = pd.Series(data)
Next,
%%timeit
new_data = data.copy()
for i in range(1, len(data)):
if new_data[i]:
new_data[i] += new_data[i - 1]
328 ms ± 4.09 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
And, timing the copy separately,
%timeit data.copy()
8.49 ms ± 17.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
So, the copy doesn't really take much time. Finally,
%timeit s.groupby(s.eq(0).cumsum()).cumsum().tolist()
122 ms ± 1.69 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
The pandas approach is conceptually linear (just like the other approaches) but faster by a constant degree because of the implementation of the library.