问题
I'm trying to calculate new values in a column whose values are cross-referenced to another column.
>>> import pandas as pd
>>> df = pd.DataFrame( {"A":[0., 100., 80., 40., 0., 60.],
"B":[12, 12, 3, 19, 3, 19]} )
>>> df
A B
0 0.0 12
1 100.0 12
2 80.0 3
3 40.0 19
4 0.0 3
5 60.0 19
I want to find all values in column A that are 0, find out the corresponding value in column B, then change all column A values that have the same column B value, according to some function. For instance in the example above I would like to change the first two values of column A, df.A[0]
and df.A[1]
, respectively 0. and 100., into 0.5 and 99.5, because df.A[0]
is 0. and it has the same value df.B[0] = 12
in column B as df.B[1] = 12
.
df
A B
0 0.5 12
1 99.5 12
2 79.5 3
3 40.0 19
4 0.5 3
5 60.0 19
I tried chaining loc, aggregate, groupby and mask functionalities, but I'm not succeeding. Is the only way through a for loop?
EDIT: Broadened example to better illustrate intent.
回答1:
This will work:
import pandas as pd
df = pd.DataFrame( {"A":[0., 100., 40., 60.], "B":[12, 12, 19, 19]} )
def f(series):
return (series + 0.5).where(series == 0, series - 0.5)
B_value = df.loc[df['A'] == 0, 'B'][0]
df.loc[df['B'] == B_value, 'A'] = df.loc[df['B'] == B_value, 'A'].transform(f)
print(df)
Output:
A B
0 0.5 12
1 99.5 12
2 40.0 19
3 60.0 19
You can pass an arbitrary function into transform
.
There might be a cleaner way to do this; it strikes me as slightly messy.
回答2:
I found a working solution, although probably sub-optimal. I chain groupby, filter and transform to obtain a desired series, and then replace the result in the original dataframe.
import pandas as pd
df = pd.DataFrame( {"A":[0., 100., 80., 40., 0., 60.],
"B":[12, 12, 3, 19, 3, 19]} )
u = ( df.groupby(by="B", sort=False)
.filter(lambda x: x.A.min() == 0, dropna=False)
.A.transform( lambda x: (x+0.5).where(x == 0, x - 0.5) )
)
df.loc[pd.notnull(u), "A"] = u
gives the following results
print("\ninitial df\n",df,"\n\nintermediate series\n",u,"\n\nfinal result",df)
initial df
A B
0 0.0 12
1 100.0 12
2 80.0 3
3 40.0 19
4 0.0 3
5 60.0 19
intermediate series
0 0.5
1 99.5
2 79.5
3 NaN
4 0.5
5 NaN
Name: A, dtype: float64
final result A B
0 0.5 12
1 99.5 12
2 79.5 3
3 40.0 19
4 0.5 3
5 60.0 19
来源:https://stackoverflow.com/questions/55525942/pandas-calculate-new-value-based-on-cross-reference-with-another-column