Pandas rolling sum on string column

余生长醉 提交于 2019-12-02 04:27:20

问题


I'm using Python3 with pandas version '0.19.2'.

I have a pandas df as follows:

chat_id    line
1          'Hi.'
1          'Hi, how are you?.'
1          'I'm well, thanks.'
2          'Is it going to rain?.'
2          'No, I don't think so.'

I want to group by 'chat_id', then do something like a rolling sum on 'line' to get the following:

chat_id    line                     conversation
1          'Hi.'                    'Hi.'
1          'Hi, how are you?.'      'Hi. Hi, how are you?.'
1          'I'm well, thanks.'      'Hi. Hi, how are you?. I'm well, thanks.'
2          'Is it going to rain?.'  'Is it going to rain?.'
2          'No, I don't think so.'  'Is it going to rain?. No, I don't think so.'

I believe df.groupby('chat_id')['line'].cumsum() would only work on a numeric column.

I have also tried df.groupby(by=['chat_id'], as_index=False)['line'].apply(list) to get a list of all the lines in the full conversation, but then I can't figure out how to unpack that list to create the 'rolling sum' style conversation column.


回答1:


For me works apply with Series.cumsum, if need separator add space:

df['new'] = df.groupby('chat_id')['line'].apply(lambda x: (x + ' ').cumsum().str.strip())
print (df)
   chat_id                   line                                          new
0        1                    Hi.                                          Hi.
1        1      Hi, how are you?.                        Hi. Hi, how are you?.
2        1      I'm well, thanks.      Hi. Hi, how are you?. I'm well, thanks.
3        2  Is it going to rain?.                        Is it going to rain?.
4        2  No, I don't think so.  Is it going to rain?. No, I don't think so.

df['line'] = df['line'].str.strip("'")
df['new'] = df.groupby('chat_id')['line'].apply(lambda x: "'" + (x + ' ').cumsum().str.strip() + "'")
print (df)
   chat_id                   line  \
0        1                    Hi.   
1        1      Hi, how are you?.   
2        1      I'm well, thanks.   
3        2  Is it going to rain?.   
4        2  No, I don't think so.   

                                             new  
0                                          'Hi.'  
1                        'Hi. Hi, how are you?.'  
2      'Hi. Hi, how are you?. I'm well, thanks.'  
3                        'Is it going to rain?.'  
4  'Is it going to rain?. No, I don't think so.' 


来源:https://stackoverflow.com/questions/43569056/pandas-rolling-sum-on-string-column

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!