Summing values with similar row values

后端 未结 1 1824
醉梦人生
醉梦人生 2021-01-15 17:14

I have a pandas data set that looks like this

city    difference 
NY       6
SF       8
LA       8
NY       9
SF       10

I want to sum up

相关标签:
1条回答
  • 2021-01-15 17:27

    I think you need transform:

    df['total difference'] = df.groupby('city')['difference'].transform(sum) 
    print (df)
      city  difference  total difference
    0   NY           6                15
    1   SF           8                18
    2   LA           8                 8
    3   NY           9                15
    4   SF          10                18
    

    And if need sort column also:

    df['total difference'] = df.groupby('city')['difference'].transform('sum') 
    df = df.sort_values('city')
    print (df)
      city  difference  total difference
    2   LA           8                 8
    0   NY           6                15
    3   NY           9                15
    1   SF           8                18
    4   SF          10                18
    

    I was interested about differences in functions and timings are very similar:

    #[10000000 rows x 2 columns]
    np.random.seed(100)
    df = pd.DataFrame(np.random.randint(1000, size=(10000000,2)), columns=['city','difference'])
    #print (df)
    
    In [293]: %timeit (df.groupby('city')['difference'].transform('sum'))
    1 loop, best of 3: 570 ms per loop
    
    In [294]: %timeit (df.groupby('city')['difference'].transform(sum))
    1 loop, best of 3: 567 ms per loop
    
    In [295]: %timeit (df.groupby('city')['difference'].transform(np.sum))
    1 loop, best of 3: 561 ms per loop
    
    0 讨论(0)
提交回复
热议问题