问题
How to do a cumulative concatenate in pandas dataframe? I found there are a number of solutions in R, but can't find it in python.
Here is the problem: suppose we have a dataframe: with columns: date
and name
:
import pandas as pd
d = {'date': [1,1,2,2,3,3,3,4,4,4], 'name':['A','B','A','C','A','B','B','A','B','C']}
df = pd.DataFrame(data=d)
I want to get CUM_CONCAT
, which is a cumulative concatenate groupby date:
date name CUM_CONCAT
0 1 A [A]
1 1 B [A,B]
2 2 A [A]
3 2 C [A,C]
4 3 A [A]
5 3 B [A,B]
6 3 B [A,B,B]
7 4 A [A]
8 4 B [A,B]
9 4 C [A,B,C]
so far i've tried:
temp = df.groupby(['date'])['name'].apply(list)
df = df.join(temp, 'date', rsuffix='_cum_concat')
and what i've got was:
date name CUM_CONCAT
0 1 A [A,B]
1 1 B [A,B]
2 2 A [A,C]
3 2 C [A,C]
4 3 A [A,B,B]
5 3 B [A,B,B]
6 3 B [A,B,B]
7 4 A [A,B,C]
8 4 B [A,B,C]
9 4 C [A,B,C]
I know there are .rolling
and cumsum
functions, which are similar to what i need, but they are mainly for cumulative sum not for concat.
Any help will be appreciated!!!
回答1:
pandas
rolling
will not support object
, so you may need
df['CUM_CONCAT']=[y.name.tolist()[:z+1] for x, y in df.groupby('date')for z in range(len(y))]
df
Out[33]:
date name CUM_CONCAT
0 1 A [A]
1 1 B [A, B]
2 2 A [A]
3 2 C [A, C]
4 3 A [A]
5 3 B [A, B]
6 3 B [A, B, B]
7 4 A [A]
8 4 B [A, B]
9 4 C [A, B, C]
回答2:
I have came up with a solution as follow:
In terms of time taken to run, both solutions (me and @Wen-Ben) seem similar, his code is shorter tho
from itertools import accumulate
def cum_concat(x):
return list(accumulate(x))
f = lambda x: cum_concat([[i] for i in x])
b =df.groupby(['date'])['name'].apply(f)
df['CUM_CONCAT']=[item for sublist in b for item in sublist]
df
Out:
date name CUM_CONCAT
0 1 A [A]
1 1 B [A, B]
2 2 A [A]
3 2 C [A, C]
4 3 A [A]
5 3 B [A, B]
6 3 B [A, B, B]
7 4 A [A]
8 4 B [A, B]
9 4 C [A, B, C]
来源:https://stackoverflow.com/questions/55111417/python-cumulative-concatenate-in-pandas-dataframe