Subtracting group specific value from rows in pandas

问题

In Pandas I have a data frame consisting of two groups with several samples in each group. Each group has an internal reference value that I want to subtract from all the sample values within that group.

s = u"""Group    sample    value
group1    ref1    18.1
group1    smp1    NaN
group1    smp2    20.3
group1    smp3    30.0
group2    ref2    16.1
group2    smp4    29.2
group2    smp5    19.9
group2    smp6    28.9
"""
df = pd.read_csv(io.StringIO(s), sep='\s+')
df = df.set_index(['Group', 'sample'])
df

Out[82]: 

                 value    
Group    sample
group1   ref1    18.1
         smp1    NaN
         smp2    20.3
         smp3    30.0
group2   ref2    16.1
         smp4    29.2
         smp5    19.9
         smp6    28.9

What I want do do is to add a new column where the reference (ref) has been subtracted from all samples (smp) within each respective group. Like this:

                   value   deltaValue
SampleGroup   sample              
Group1        ref      18.1    0
              smp1     NaN     NaN
              smp2     20.3    2.2
              smp3     30.0    11.9
Group2        ref2     16.1    0
              smp4     29.2    13.1
              smp5     19.9    3.8
              smp6     28.9    12.8

Does anyone know how this can be done? Thanks!

回答1:

Group your dataframe by sample column. Then iterate through each group and get the ref sample value. Then subtract with the entire column.

> df = pd.read_csv(io.StringIO(s), sep='\s+')
> df['diff'] = 0
> df_group = df.groupby('Group')
> for index, group in df_group:
      df['diff'][df.index.isin(group.index)] = group[group['sample'] == 'ref'+ str(index.split('group')[1])]['value'].values[0] - group['value']
> print df
    Group sample  value  diff
0  group1   ref1   18.1   0.0
1  group1   smp1    NaN   NaN
2  group1   smp2   20.3  -2.2
3  group1   smp3   30.0 -11.9
4  group2   ref2   16.1   0.0
5  group2   smp4   29.2 -13.1
6  group2   smp5   19.9  -3.8
7  group2   smp6   28.9 -12.8

回答2:

Here's one way to do it without loops

First create a func function which identifies sample which starts with ref and then calculates delta value.

In [33]: def func(grp):
    ref = grp.ix[grp['sample'].str.startswith('ref'), 'value']
    grp['delta'] = grp['value'] - ref.values[0]
    return grp

Use this func and apply over the the dff.groupby('Group')

In [34]: dff.groupby('Group').apply(func)
Out[34]:
    Group sample  value  delta
0  group1   ref1   18.1    0.0
1  group1   smp1    NaN    NaN
2  group1   smp2   20.3    2.2
3  group1   smp3   30.0   11.9
4  group2   ref2   16.1    0.0
5  group2   smp4   29.2   13.1
6  group2   smp5   19.9    3.8
7  group2   smp6   28.9   12.8

To begin with your dff should be like, which could be created like dff = df.reset_index()

In [35]: dff
Out[35]:
    Group sample  value
0  group1   ref1   18.1
1  group1   smp1    NaN
2  group1   smp2   20.3
3  group1   smp3   30.0
4  group2   ref2   16.1
5  group2   smp4   29.2
6  group2   smp5   19.9
7  group2   smp6   28.9

来源：https://stackoverflow.com/questions/30258974/subtracting-group-specific-value-from-rows-in-pandas

标签

python

pandas

row

calc