问题
I have a dataframe, df:
id volume saturation time_delay_normalised speed BPR_free_speed BPR_speed Volume time_normalised
27WESTBOUND 580 0.351515152 57 6.54248366 17.88 15.91366177 580 1.59375
27WESTBOUND 588 0.356363636 100 5.107142857 17.88 15.86519847 588 2.041666667
27WESTBOUND 475 0.287878788 64 6.25625 17.88 16.51161331 475 0.666666667
27EASTBOUND 401 0.243030303 59 6.458064516 17.88 16.88283672 401 1.0914583333
27EASTBOUND 438 0.265454545 46 7.049295775 17.88 16.70300418 438 1.479166667
27EASTBOUND 467 0.283030303 58 6.5 17.88 16.55392848 467 0.9604166667
I wish to create a new column, free_capacity
and set it as the maximum value of Volume
, per ID
, when time_normalised
is less than or equal to 1.1
Without considering the time_normalised condition, I can do this:
df['free_capacity'] = df.groupby('id')["Volume"].transform('max')
How do I add the when time_normalised <= 1.1
condition?
EDIT
@jezrael suggested the following:
df.loc[df['time_normalised'] <= 1.1, 'free_capacity'] = df.loc[df['time_normalised'] <= 1.1].groupby('id')["Volume"].transform('max')
Which gives:
id volume saturation time_delay_normalised speed \
27WESTBOUND 580 0.351515 57 6.542484
27WESTBOUND 588 0.356364 100 5.107143
27WESTBOUND 475 0.287879 64 6.256250
27EASTBOUND 401 0.243030 59 6.458065
27EASTBOUND 438 0.265455 46 7.049296
27EASTBOUND 467 0.283030 58 6.500000
BPR_free_speed BPR_speed Volume time_normalised free_capacity
17.88 15.913662 580 1.593750 NaN
17.88 15.865198 588 2.041667 NaN
17.88 16.511613 475 0.666667 475.0
17.88 16.882837 401 1.091458 467.0
17.88 16.703004 438 1.479167 NaN
17.88 16.553928 467 0.960417 467.0
However, I still wish to attribute the value of free_capacity, identified by id
Thus, I tried:
df['free_capacity'] = df.loc[df['time_normalised'] <= 1.1].groupby('id')["Volume"].transform('max')
However, this still results in NaN values. The 1.1 time_normalised condition is for finding the value, not limiting its application.
The desired outcome:
id volume saturation time_delay_normalised speed \
27WESTBOUND 580 0.351515 57 6.542484
27WESTBOUND 588 0.356364 100 5.107143
27WESTBOUND 475 0.287879 64 6.256250
27EASTBOUND 401 0.243030 59 6.458065
27EASTBOUND 438 0.265455 46 7.049296
27EASTBOUND 467 0.283030 58 6.500000
BPR_free_speed BPR_speed Volume time_normalised free_capacity
17.88 15.913662 580 1.593750 475.0
17.88 15.865198 588 2.041667 475.0
17.88 16.511613 475 0.666667 475.0
17.88 16.882837 401 1.091458 467.0
17.88 16.703004 438 1.479167 467.0
17.88 16.553928 467 0.960417 467.0
回答1:
You can use where for filtering by conditions and then groupby by Series
df['id']
with transform:
df['free_capacity'] = df['Volume'].where(df['time_normalised'] <= 1.1)
.groupby(df['id'])
.transform('max')
print df
id volume saturation time_delay_normalised speed \
0 27WESTBOUND 580 0.351515 57 6.542484
1 27WESTBOUND 588 0.356364 100 5.107143
2 27WESTBOUND 475 0.287879 64 6.256250
3 27EASTBOUND 401 0.243030 59 6.458065
4 27EASTBOUND 438 0.265455 46 7.049296
5 27EASTBOUND 467 0.283030 58 6.500000
BPR_free_speed BPR_speed Volume time_normalised free_capacity
0 17.88 15.913662 580 1.593750 475.0
1 17.88 15.865198 588 2.041667 475.0
2 17.88 16.511613 475 0.666667 475.0
3 17.88 16.882837 401 1.091458 467.0
4 17.88 16.703004 438 1.479167 467.0
5 17.88 16.553928 467 0.960417 467.0
It is same if use where for creating new column Volume1
by your criteria:
df['Volume1'] = df['Volume'].where(df['time_normalised'] <= 1.1)
print df
id volume saturation time_delay_normalised speed \
0 27WESTBOUND 580 0.351515 57 6.542484
1 27WESTBOUND 588 0.356364 100 5.107143
2 27WESTBOUND 475 0.287879 64 6.256250
3 27EASTBOUND 401 0.243030 59 6.458065
4 27EASTBOUND 438 0.265455 46 7.049296
5 27EASTBOUND 467 0.283030 58 6.500000
BPR_free_speed BPR_speed Volume time_normalised Volume1
0 17.88 15.913662 580 1.593750 NaN
1 17.88 15.865198 588 2.041667 NaN
2 17.88 16.511613 475 0.666667 475.0
3 17.88 16.882837 401 1.091458 401.0
4 17.88 16.703004 438 1.479167 NaN
5 17.88 16.553928 467 0.960417 467.0
Use groupby with transform with new column Volume1
:
df['free_capacity'] = df.groupby('id')["Volume1"].transform('max')
print df
id volume saturation time_delay_normalised speed \
0 27WESTBOUND 580 0.351515 57 6.542484
1 27WESTBOUND 588 0.356364 100 5.107143
2 27WESTBOUND 475 0.287879 64 6.256250
3 27EASTBOUND 401 0.243030 59 6.458065
4 27EASTBOUND 438 0.265455 46 7.049296
5 27EASTBOUND 467 0.283030 58 6.500000
BPR_free_speed BPR_speed Volume time_normalised Volume1 free_capacity
0 17.88 15.913662 580 1.593750 NaN 475.0
1 17.88 15.865198 588 2.041667 NaN 475.0
2 17.88 16.511613 475 0.666667 475.0 475.0
3 17.88 16.882837 401 1.091458 401.0 467.0
4 17.88 16.703004 438 1.479167 NaN 467.0
5 17.88 16.553928 467 0.960417 467.0 467.0
回答2:
Consider also a groupby().apply()
:
def maxtime(row):
row['free_capacity'] = row[row['time_normalised'] <= 1.1]['Volume'].max()
return row
df = df.groupby('id').apply(maxtime)
回答3:
There can be several answers, You can also do this:
df.set_index('id', inplace=True)
df['free_capacity'] = df.groupby(level=0).apply(lambda x: x.loc[x['time_normalised']<=1.1]['volume'].max())
This gives the following:
volume saturation time_delay_normalised speed \
id
27WESTBOUND 580 0.351515 57 6.542484
27WESTBOUND 588 0.356364 100 5.107143
27WESTBOUND 475 0.287879 64 6.256250
27EASTBOUND 401 0.243030 59 6.458065
27EASTBOUND 438 0.265455 46 7.049296
27EASTBOUND 467 0.283030 58 6.500000
BPR_free_speed BPR_speed Volume time_normalised wrong_x free_capacity
id
27WESTBOUND 17.88 15.913662 580 1.593750 588 475
27WESTBOUND 17.88 15.865198 588 2.041667 588 475
27WESTBOUND 17.88 16.511613 475 0.666667 588 475
27EASTBOUND 17.88 16.882837 401 1.091458 467 467
27EASTBOUND 17.88 16.703004 438 1.479167 467 467
27EASTBOUND 17.88 16.553928 467 0.960417 467 467
You can reset the index back if you want by df.reset_index(inplace=True)
The wrong_x column is the wrong result, without the condition by doing
df['wrong_x']=B.groupby(level=0)['volume'].max()
which is what you tried initially.
来源:https://stackoverflow.com/questions/36792806/pandas-find-maximum-value-when-and-if-conditions