In Pandas I have a data frame consisting of two groups with several samples in each group. Each group has an internal reference value that I want to subtract from all the sample values within that group.
s = u"""Group sample value
group1 ref1 18.1
group1 smp1 NaN
group1 smp2 20.3
group1 smp3 30.0
group2 ref2 16.1
group2 smp4 29.2
group2 smp5 19.9
group2 smp6 28.9
"""
df = pd.read_csv(io.StringIO(s), sep='\s+')
df = df.set_index(['Group', 'sample'])
df
Out[82]:
value
Group sample
group1 ref1 18.1
smp1 NaN
smp2 20.3
smp3 30.0
group2 ref2 16.1
smp4 29.2
smp5 19.9
smp6 28.9
What I want do do is to add a new column where the reference (ref) has been subtracted from all samples (smp) within each respective group. Like this:
value deltaValue
SampleGroup sample
Group1 ref 18.1 0
smp1 NaN NaN
smp2 20.3 2.2
smp3 30.0 11.9
Group2 ref2 16.1 0
smp4 29.2 13.1
smp5 19.9 3.8
smp6 28.9 12.8
Does anyone know how this can be done? Thanks!
Group your dataframe by sample
column. Then iterate through each group and get the ref
sample value. Then subtract with the entire column.
> df = pd.read_csv(io.StringIO(s), sep='\s+')
> df['diff'] = 0
> df_group = df.groupby('Group')
> for index, group in df_group:
df['diff'][df.index.isin(group.index)] = group[group['sample'] == 'ref'+ str(index.split('group')[1])]['value'].values[0] - group['value']
> print df
Group sample value diff
0 group1 ref1 18.1 0.0
1 group1 smp1 NaN NaN
2 group1 smp2 20.3 -2.2
3 group1 smp3 30.0 -11.9
4 group2 ref2 16.1 0.0
5 group2 smp4 29.2 -13.1
6 group2 smp5 19.9 -3.8
7 group2 smp6 28.9 -12.8
Here's one way to do it without loops
First create a func
function which identifies sample
which starts with ref
and then calculates delta
value.
In [33]: def func(grp):
ref = grp.ix[grp['sample'].str.startswith('ref'), 'value']
grp['delta'] = grp['value'] - ref.values[0]
return grp
Use this func
and apply over the the dff.groupby('Group')
In [34]: dff.groupby('Group').apply(func)
Out[34]:
Group sample value delta
0 group1 ref1 18.1 0.0
1 group1 smp1 NaN NaN
2 group1 smp2 20.3 2.2
3 group1 smp3 30.0 11.9
4 group2 ref2 16.1 0.0
5 group2 smp4 29.2 13.1
6 group2 smp5 19.9 3.8
7 group2 smp6 28.9 12.8
To begin with your dff
should be like, which could be created like dff = df.reset_index()
In [35]: dff
Out[35]:
Group sample value
0 group1 ref1 18.1
1 group1 smp1 NaN
2 group1 smp2 20.3
3 group1 smp3 30.0
4 group2 ref2 16.1
5 group2 smp4 29.2
6 group2 smp5 19.9
7 group2 smp6 28.9
来源:https://stackoverflow.com/questions/30258974/subtracting-group-specific-value-from-rows-in-pandas