Subtracting group specific value from rows in pandas

丶灬走出姿态 提交于 2019-12-02 13:10:20

Group your dataframe by sample column. Then iterate through each group and get the ref sample value. Then subtract with the entire column.

> df = pd.read_csv(io.StringIO(s), sep='\s+')
> df['diff'] = 0
> df_group = df.groupby('Group')
> for index, group in df_group:
      df['diff'][df.index.isin(group.index)] = group[group['sample'] == 'ref'+ str(index.split('group')[1])]['value'].values[0] - group['value']
> print df
    Group sample  value  diff
0  group1   ref1   18.1   0.0
1  group1   smp1    NaN   NaN
2  group1   smp2   20.3  -2.2
3  group1   smp3   30.0 -11.9
4  group2   ref2   16.1   0.0
5  group2   smp4   29.2 -13.1
6  group2   smp5   19.9  -3.8
7  group2   smp6   28.9 -12.8

Here's one way to do it without loops

First create a func function which identifies sample which starts with ref and then calculates delta value.

In [33]: def func(grp):
    ref = grp.ix[grp['sample'].str.startswith('ref'), 'value']
    grp['delta'] = grp['value'] - ref.values[0]
    return grp

Use this func and apply over the the dff.groupby('Group')

In [34]: dff.groupby('Group').apply(func)
Out[34]:
    Group sample  value  delta
0  group1   ref1   18.1    0.0
1  group1   smp1    NaN    NaN
2  group1   smp2   20.3    2.2
3  group1   smp3   30.0   11.9
4  group2   ref2   16.1    0.0
5  group2   smp4   29.2   13.1
6  group2   smp5   19.9    3.8
7  group2   smp6   28.9   12.8

To begin with your dff should be like, which could be created like dff = df.reset_index()

In [35]: dff
Out[35]:
    Group sample  value
0  group1   ref1   18.1
1  group1   smp1    NaN
2  group1   smp2   20.3
3  group1   smp3   30.0
4  group2   ref2   16.1
5  group2   smp4   29.2
6  group2   smp5   19.9
7  group2   smp6   28.9
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!