问题
In Pandas I have a data frame consisting of two groups with several samples in each group. Each group has an internal reference value that I want to subtract from all the sample values within that group.
s = u"""Group sample value
group1 ref1 18.1
group1 smp1 NaN
group1 smp2 20.3
group1 smp3 30.0
group2 ref2 16.1
group2 smp4 29.2
group2 smp5 19.9
group2 smp6 28.9
"""
df = pd.read_csv(io.StringIO(s), sep='\s+')
df = df.set_index(['Group', 'sample'])
df
Out[82]:
value
Group sample
group1 ref1 18.1
smp1 NaN
smp2 20.3
smp3 30.0
group2 ref2 16.1
smp4 29.2
smp5 19.9
smp6 28.9
What I want do do is to add a new column where the reference (ref) has been subtracted from all samples (smp) within each respective group. Like this:
value deltaValue
SampleGroup sample
Group1 ref 18.1 0
smp1 NaN NaN
smp2 20.3 2.2
smp3 30.0 11.9
Group2 ref2 16.1 0
smp4 29.2 13.1
smp5 19.9 3.8
smp6 28.9 12.8
Does anyone know how this can be done? Thanks!
回答1:
Group your dataframe by sample
column. Then iterate through each group and get the ref
sample value. Then subtract with the entire column.
> df = pd.read_csv(io.StringIO(s), sep='\s+')
> df['diff'] = 0
> df_group = df.groupby('Group')
> for index, group in df_group:
df['diff'][df.index.isin(group.index)] = group[group['sample'] == 'ref'+ str(index.split('group')[1])]['value'].values[0] - group['value']
> print df
Group sample value diff
0 group1 ref1 18.1 0.0
1 group1 smp1 NaN NaN
2 group1 smp2 20.3 -2.2
3 group1 smp3 30.0 -11.9
4 group2 ref2 16.1 0.0
5 group2 smp4 29.2 -13.1
6 group2 smp5 19.9 -3.8
7 group2 smp6 28.9 -12.8
回答2:
Here's one way to do it without loops
First create a func
function which identifies sample
which starts with ref
and then calculates delta
value.
In [33]: def func(grp):
ref = grp.ix[grp['sample'].str.startswith('ref'), 'value']
grp['delta'] = grp['value'] - ref.values[0]
return grp
Use this func
and apply over the the dff.groupby('Group')
In [34]: dff.groupby('Group').apply(func)
Out[34]:
Group sample value delta
0 group1 ref1 18.1 0.0
1 group1 smp1 NaN NaN
2 group1 smp2 20.3 2.2
3 group1 smp3 30.0 11.9
4 group2 ref2 16.1 0.0
5 group2 smp4 29.2 13.1
6 group2 smp5 19.9 3.8
7 group2 smp6 28.9 12.8
To begin with your dff
should be like, which could be created like dff = df.reset_index()
In [35]: dff
Out[35]:
Group sample value
0 group1 ref1 18.1
1 group1 smp1 NaN
2 group1 smp2 20.3
3 group1 smp3 30.0
4 group2 ref2 16.1
5 group2 smp4 29.2
6 group2 smp5 19.9
7 group2 smp6 28.9
来源:https://stackoverflow.com/questions/30258974/subtracting-group-specific-value-from-rows-in-pandas