How to combine values in a row depending on value in another row in pandas

前端 未结 1 1128
不知归路
不知归路 2021-01-24 21:52

I have a pandas dataframe with several columns (words, start time, stop time, speaker). I want to combine all values in the \'word\' column while the values in the \'speaker\' c

相关标签:
1条回答
  • 2021-01-24 22:28

    We'll use GroupBy.agg with a dict of aggfuncs:

    (df.groupby('speaker', as_index=False, sort=False)
       .agg({'word': ' '.join, 'start': 'min', 'stop': 'max',}))
    
       speaker                word  start  stop
    0        2  but that's alright   2.72  3.47
    1        1       we'll have to   8.43  9.07
    

    To group by consecutive occurrences, use the shifting cumsum trick, then use that as the second grouper along with "speaker":

    gp1 = df['speaker'].ne(df['speaker'].shift()).cumsum()
    
    (df.groupby(['speaker', gp1], as_index=False, sort=False)
       .agg({'word': ' '.join, 'start': 'min', 'stop': 'max',}))
    
       speaker                word  start   stop
    0        2  but that's alright   2.72   3.47
    1        1       we'll have to   8.43   9.07
    2        2           okay sure   9.19  11.01
    3        1               what?  11.02  12.00
    
    0 讨论(0)
提交回复
热议问题